Download PDF
ads:
Universidade de Pernambuco
Escola Politécnica de Pernambuco
Departamento de Sistemas e Computação
Programa de Pós-Graduação em Engenharia da Computação
Renata Wanderley Medeiros
Scheduling Parallel Jobs for Multiphysics
Simulators
Dissertação de Mestrado
Recife, 28 de janeiro 2010
ads:
Livros Grátis
http://www.livrosgratis.com.br
Milhares de livros grátis para download.
UNIVERSIDADE DE PERNAMBUCO
DEPARTAMENTO DE SISTEMAS E COMPUTAÇÃO
PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DA COMPUTAÇÃO
RENATA WANDERLEY MEDEIROS
Scheduling Parallel Jobs for Multiphysics
Simulators
Thesis presented in partial fulfillment
of the requirements for the degree of
Master of Computer Science
Prof. Dr. Ricardo Massa Ferreira Lima
Advisor
Prof. Dr. Felix Christian Guimarães Santos
Coadvisor
Recife, 28 de janeiro 2010
ads:
CIP – CATALOGING-IN-PUBLICATION
Medeiros, Renata Wanderley
Scheduling Parallel Jobs for Multiphysics Simulators / Renata
Wanderley Medeiros. – Recife: PPGEC da UPE, 2010.
125 f.: il.
Thesis (Master) – Universidade de Pernambuco. Programa de
Pós-Graduação em Engenharia da Computação, Recife, BR–PE,
2010. Advisor: Ricardo Massa Ferreira Lima; Coadvisor: Felix
Christian Guimarães Santos.
1. Scheduling. 2. Genetic Algorithms. 3. Coloured Petri Nets.
4. MPhyScaS. I. Lima, Ricardo Massa Ferreira. II. Santos, Felix
Christian Guimarães. III. Título.
To my parents and aunt, who emphasized the importance
of education and help me with my lessons through their lives.
And to my husband, who has been proud of my work and
who has shared the many challenges for completing this dissertation.
Acknowledgment
It is a pleasure to thank those who made this dissertation possible such as my parents,
Olavio and Ione Medeiros, and my aunt, Orquidia, who always gave me aection and
attention and also gave me the moral support to make me who I am.
I am heartily thankful to my husband, Danilo Carvalho, for the aid in the emotional
and intellectual portions of this dissertation.
To my sister, Roberta, and her family, Adriano and Luísa, a steady presence in my life
for the happiness and understanding always given to me.
I also would like to make a special reference to Professor Ricardo Massa that still
contributesto my academic life, providingknowledgeand guiding me through the studies.
In addition, a thank to Professors Felix Christian and Sérgio Soares who also introduce
me in MPhyScaS project and also enable me to develop an understanding of the subject.
To all other Professors of Department of Computing and Systems of Pernambuco
University, who instructed me the best during the masters course.
Lastly, I oer my regards and blessings to my colleagues César Lins, and Fernando
Rocha, and to all of those who supported me in any respect during the completion of the
project.
Contents
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Algorithms Used in Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The MPhyScaS problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 MPHYSCAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 MPhyScaS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 MPhyScaS Parallel Architecture . . . . . . . . . . . . . . . . . . . . . . . 13
3 SCHEDULING ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Classification of Scheduling Problems . . . . . . . . . . . . . . . . . . . . 20
3.1.1 Machine Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.2 Job Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.3 Optimality Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Compared Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 List Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 Longest Processing Time - LPT . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.3 Shortest Processing Time - SPT . . . . . . . . . . . . . . . . . . . . . . . . 27
4 IDENTIFYING JOB DEPENDENCIES . . . . . . . . . . . . . . . . . . . 33
4.1 Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Coloured Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 MPhyScaS - Petri Net Model . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5 PROPOSED ALGORITHM . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.1 Initial Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.2 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.3 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.4 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.5 Elitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.6 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2.1 Individual Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2.2 Selection and Elitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.3 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.4 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.5 Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 CASE STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2 Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.1 System Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Global States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.5 QuantityTasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7 EXPERIMENTS AND RESULTS . . . . . . . . . . . . . . . . . . . . . . . 76
7.1 The First Abstract Simulator . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.2 The Second Abstract Simulator . . . . . . . . . . . . . . . . . . . . . . . . 80
7.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8 CONCLUSIONS AND FUTURE WORKS . . . . . . . . . . . . . . . . . . 86
8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
APPENDIX A CASE STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A.1 Kernel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A.2 System Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.3 Global States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
A.4.1 Algorithm 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
A.4.2 Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.4.3 Algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.4.4 Algorithm 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A.4.5 Algorithm 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.4.6 Algorithm 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A.4.7 Algorithm 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
A.4.8 Algorithm 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
A.4.9 Algorithm 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
A.5 QuantityTasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A.5.1 QuantityTasks for Group 0 . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A.5.2 QuantityTasks for Group 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 111
A.5.3 QuantityTasks for Group 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 117
A.5.4 QuantityTasks for Group 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 118
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
List of Figures
Figure 1.1: An overview of the whole process proposed. . . . . . . . . . . . . . 5
Figure 2.1: Computational representation for the layers of the simulator. . . . . . 9
Figure 2.2: Each Phenomenon is able of computing a set of quantities. . . . . . . 12
Figure 2.3: Each Phenomenon has its own set of States, which is stored in its Group. 12
Figure 2.4: A quantity computed by a Phenomenon may be coupled to a State
from other Phenomenon. . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure 2.5: Hierarchy of the simulator in MPhyScaS-P. . . . . . . . . . . . . . . 17
Figure 2.6: Layers with procedures executed by ClusterRank in MPhyScaS-P. . . 17
Figure 2.7: Layers with procedures executed by MachineRank in MPhyScaS-P. . 18
Figure 2.8: Layers with procedures executed by ProcessRank in MPhyScaS-P. . . 18
Figure 3.1: Example of a job shop scheduling problem. . . . . . . . . . . . . . . 22
Figure 3.2: Example of a flow shop scheduling problem. . . . . . . . . . . . . . 22
Figure 3.3: Example of a open shop scheduling problem. . . . . . . . . . . . . . 22
Figure 3.4: Example of LPT schedules. . . . . . . . . . . . . . . . . . . . . . . 31
Figure 3.5: Example of SPT schedules. . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 4.1: An illustration of a transition (firing) rule: (a) the marking before
firing the enabled transition t; (b) the marking after firing t, where t is
disabled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Figure 4.2: Graphical representation of Petri nets PN
1
and PN
2
. . . . . . . . . . 42
Figure 4.3: Graphical representation of Petri nets PN
3
and PN
4
. . . . . . . . . . 43
Figure 4.4: Graphical representation of Petri nets PN
5
, PN
6
and PN
7
. . . . . . . . 45
Figure 4.5: Graphical representation of Petri nets PN
8
and PN
9
. . . . . . . . . . 46
Figure 4.6: Graphical representation of Petri nets PN
10
, PN
11
and PN
12
. . . . . . 48
Figure 4.7: Graphical representation of Petri nets PN
13
and PN
14
. . . . . . . . . . 50
Figure 4.8: An example of a Petri net that represents the execution of two Group-
Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Figure 4.9: The Petri net with arcs direction inverted. . . . . . . . . . . . . . . . 52
Figure 4.10: The Petri net representing the execution of the first executed task. . . 53
Figure 4.11: The Petri net representing the execution of the second executed task. . 54
Figure 5.1: Fitness-proportionate selection example. . . . . . . . . . . . . . . . . 59
Figure 5.2: Tournament selection example. . . . . . . . . . . . . . . . . . . . . . 60
Figure 5.3: Single-point crossover example. . . . . . . . . . . . . . . . . . . . . 61
Figure 5.4: Multipoint crossover example. . . . . . . . . . . . . . . . . . . . . . 61
Figure 5.5: Uniform crossover example. . . . . . . . . . . . . . . . . . . . . . . 62
Figure 5.6: Mutation example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Figure 5.7: An example of a scheduling individual. . . . . . . . . . . . . . . . . 65
Figure 5.8: First step of crossover individuals. . . . . . . . . . . . . . . . . . . . 66
Figure 5.9: Second step of crossover individuals. . . . . . . . . . . . . . . . . . 66
Figure 5.10: First step of crossover individuals that generates unfeasible ospring. 67
Figure 5.11: Second step of crossoverindividuals that generates unfeasible ospring. 67
Figure 5.12: An example of the proposed mutation. . . . . . . . . . . . . . . . . . 68
Figure 5.13: An example of the proposed individual representation. . . . . . . . . 69
Figure 7.1: The convergence of the proposed genetic algorithm for 50 and 100
individuals using first architecture. . . . . . . . . . . . . . . . . . . . 79
Figure 7.2: The convergence of the proposed genetic algorithm for 50 and 100
individuals using second architecture. . . . . . . . . . . . . . . . . . 80
Figure 7.3: The convergence of the proposed genetic algorithm for 50 and 100
individuals using the first architecture. . . . . . . . . . . . . . . . . . 81
Figure 7.4: The convergence of the proposed genetic algorithm for 50 and 100
individuals using the second architecture. . . . . . . . . . . . . . . . 83
Figure 7.5: The convergence of the proposed genetic algorithm for 50 and 100
individuals using the first architecture. . . . . . . . . . . . . . . . . . 85
Figure 7.6: The convergence of the proposed genetic algorithm for 50 and 100
individuals using the second architecture. . . . . . . . . . . . . . . . 85
Abstract
Simulações de problemas reais envolvendo fenômenos físicos podem exigir muito tempo
para serem executados. Para melhorar o desempenho dessas simulações é necessário ter
uma abordagem de paralelização dos processos que compõem a simulação.
MPhyScaS (Multi-Physics and Multi-Scale Solver Environment) é um ambiente de-
dicado ao desenvolvimento automático de simuladores. Cada simulação do MPhyScaS
exige muito tempo de execução. Para paralelizar as simulações do MPhyScaS a abor-
dagem usada devefazer uso do conceito decamadas definido na arquiteturado MPhyScaS
com o intuito de definir uma estrutura de paralelização hierárquica.
O objetivo deste trabalho é melhorar o desempenho das simulações do MPhyScaS,
que são compostas por um conjunto de tarefas dependentes umas das outras, escalonando
essas tarefas. O modelo apresentado é baseado em Algoritmos Genéticos (GA – Genetic
Algorithms) para escalonar tarefas paralelas de acordo com as restrições de dependências
impostas pela arquitetura do MPhyScaS. A comunicação entre os processadores tam-
bém deve ser considerada neste processo. Portanto, deve-se achar um equilíbrio entre a
execução dos processos e o tempo necessário para que esses processos se comuniquem.
A técnica de GA geralmente ésimples, mas para ser aplicadaem problemas de escalon-
amento deve ter informações a respeito das dependências entre processos. Para obter es-
sas informações outra técnica chamada Redes de Petri Coloridas (CPN Coloured Petri
Nets) foi utilizada. Esta última técnica modela os dados de uma simulação do MPhyScaS
e analisa o modelo gerado identificando os processos paralelos e os dependentes.
Vários experimentos, incluindo um estudo de caso, foram realizados utilizando as
duas técnicas mencionadas. Esses experimentos produziram resultados satisfatórios, o
que demonstra a viabilidade da nossa contribuição e sua capacidade de ser aplicada aos
problemas do MPhyScaS.
Keywords: Scheduling, Genetic Algorithms, Coloured Petri Nets, MPhyScaS.
Abstract
Scheduling Parallel Jobs for Multiphysics Simulators
Real problem simulations involving physic phenomena can demand too much exe-
cution time.To improve the performance of these simulations it is necessary to have an
approach to parallelize the processes that compose the simulation.
MPhyScaS (Multi-Physics and Multi-Scale Solver Environment) is an environment
dedicated to the automatic development of simulators. Each MPhyScaS simulation de-
mands a great amount of time. To parallelize MPhyScaS simulations, the approach used
should make use of the concept of layers already used in MPhyScaS in order to define a
hierarchical parallel structure.
The aim of the work herein presented is to improve the performance of clusters in the
processing of MPhyScaS simulations which are composed by a set of dependent tasks
by scheduling these tasks. The presented model is based on Genetic Algorithms (GA) to
schedule theparallel tasks followingMPhyScaSarchitecture dependence restrictions. The
communication between processors must also be considered in this scheduling. There-
fore, a trade o must be found between the execution of processes and the time necessary
for these processes to communicate with each other.
GA are generally simple, but to be applied in scheduling problems they have to know
about the dependences between the processes. In order to obtain this information, we
use another technique called Coloured Petri Nets (CPN). The latter technique models
MPhyScaS simulation data and analyzes the model identifying the parallel processes and
the dependent ones.
Several experiments, including a case study, were carried out making use of both
techniques. These experimentsproduced satisfactory results that demonstratedhowuseful
our contribution is and its capacity to be applied in MPhyScaS.
Keywords: Scheduling, Genetic Algorithms, Coloured Petri Nets, MPhyScaS.
Chapter 1
Introduction
Scheduling is a decision-making process that is used on a regular basis in many systems.
It deals with the allocation of resources to tasks over given time periods and its goal is
to optimize one or more objectives. Scheduling, as a decision-making process, plays
an important role in most systems as well as in most information processing environ-
ments (PINEDO, 2008). For the type of system herein presented (computing system), the
resources are the processors available for executing, and the tasks are the processes that
must be executed. A process is a part of a running software program or other computing
operation that does a single task.
The first schedulingalgorithms were formulated in the mid fifties. Since then there has
been a growing interest in scheduling. During the seventies, computer scientists discov-
ered scheduling as a tool forimprovingthe performanceof computer systems(BRUCKER,
2006). Furthermore, scheduling problems have been investigated and classified with re-
spect to their computational complexity. During the last few years, new and interesting
scheduling problems have been formulated in connection with flexible manufacturing.
While sequential formulation of a problem can lead to a solution, a tremendous perfor-
mance advantage is available from doing many operations in parallel. The two principal
approaches to speed up a computation are a faster clock rate for the underlying hard-
ware and doing more operations in parallel. Introducing parallel operations to speed up
an application is a promising approach, because as tasks become larger, more operations
can potentially be done in parallel. To explore this potential, three things must work
together: i) algorithms must involve many independent operations; ii) programming lan-
guages must allow the specification of parallel operations or identify them automatically;
2
iii) the architecture of the computer running the program must execute multiple operations
simultaneously (JORDAN; ALAGHBAND, 2003).
An eective way to extract the most performance and scalability out of the parallel
application is to exploit both task parallelism and data parallelism (AIDA; CASANOVA,
2008). With this approach, which is typically termed asmixed parallelism (RAMASWAMY;
SAPATNEKAR; BANERJEE, 1997), an application consists of tasks organized in a Di-
rected Acyclic Graph (DAG) in which each edge corresponds to a precedence relation
between two tasks, implying possible data communication. The common approach to
schedule DAG’s is the task parallel paradigm, which assigns one task to one processor.
The scheduling consists on the distribution of the DAG nodes among the machine nodes,
so that the makespan is minimum.
Early scheduling algorithms do not take communication into account, but due to the
increasing gap betweencomputationand communication performance of parallel systems,
the consideration of the communication became more important (SINNEN; SOUSA,
2001) and is included in the scheduling algorithms recently proposed (KWOK; AHMAD,
1998).
1.1 Algorithms Used in Scheduling
The problem of optimally assignment of jobs onto the machines in a grid has been shown,
in general, to be NP-complete. Since an exhaustivesearch is often impractical, most of the
work has been done on fast heuristic methods to find suboptimal solutions (KANG et al.,
2008). The most studied heuristic methods for scheduling problems are List Schedul-
ing (SIMION et al., 2007), Longest Processing Time (LPT) (BLAZEWICZ et al., 2007),
Shortest Processing Time (SPT) (LEUNG; KELLY; ANDERSON, 2004), etc. These
heuristics are of the constructive type. They start without a schedule and gradually con-
struct a schedule by adding one job at a time.
To achieve a better solution quality, modern meta-heuristics have been presented for
the scheduling problem such as Genetic Algorithms (MOATTAR; RAHMANI; DER-
AKHSHI, 2007), Simulated Annealing (XU; SUN; YANG, 2007), Tabu Search (CHEN;
LIN, 2000). These algorithms are of the improvement type, which are conceptually com-
pletely dierent from algorithms of the constructive type. They start out with a complete
3
schedule, which may be selected arbitrarily, and then try to obtain a better schedule by
manipulating the current schedule.
Among these methods, GA (Genetic Algorithms), a meta-heuristic and optimization
technique, has emerged as a tool that is beneficial for a variety of study fields including
construction applications since the introduction in the 1960’s by Holland (HOLLAND,
1992). Several studies have been done to solve the scheduling problem using GA (KIM,
2007). GA has also been used successfully to solve construction management problems,
including resources scheduling with a small number of activities (HEGAZ; KASSAB,
2003).
The reason for GA success at a wide and ever growing range of scheduling problems
is a combination of power and flexibility (MONTANA et al., 1998). The power derives
from the empirically proven ability of evolutionary algorithms to eciently find glob-
ally competitive optima in large and complex search spaces. The favorable scaling of
evolutionary algorithms as a function of the search spaces makes them particularly eec-
tive in comparison with other search algorithms for the large spaces typical of real-world
scheduling (MONTANA et al., 1998).
The flexibility of Genetic Algorithms has multiple facets. Even the standard Genetic
Algorithm (i.e, bit string representation with traditional crossover and mutation opera-
tors) can eectively handle problems that many traditional optimization algorithms can-
not. Some examples of these problems are: discrete spaces; nonlinear, discontinuous
evaluation functions; nonlinear, discontinuous constraints (BANZHAF et al., 1998). The
use of non-standard Genetic Algorithms and the tailoring of representation, operators,
initialization method, etc. to fit the problem greatly increases the range of problems to
which GA can be eectively applied.
1.2 The MPhyScaS problem
MPhyScaS (Multi-Physics and Multi-Scale Solver Environment) is an environment ded-
icated to the automatic development of simulators based on the Finite Element Method
(FEM (LOGAN, 2002)). The term multi-physics is a qualifier to a set of phenomena that
interact in time and space. A multi-physics system can also be called a system of coupled
phenomena. These phenomena are of dierent natures and behavior scale.
4
Simulators based on the Finite Element Method can be organized in a layers architec-
ture (SANTOS; BRITO; BARBOSA, 2006a). The definition of the layers is important to
the system modularity. But this does not indicate either how entities belonging to dierent
layers interact or what data they share or depend upon. This is certainly very important
for the definition of abstractions that can standardize how layers interact and behave. The
definition of MPhyScaS architecture is aimed at improving the quality of simulator de-
signs. The defined architecture attempts to fill in the existing gap in the development of
FEM simulators for multi-physics and multi-scales problems.
MPhyScaS simulates real problems involving a set of dierent physic phenomena.
Each MPhyScaS simulationdemands agreat amount of time. To improvethe performance
of MPhyScaS simulation it is necessary to have an approach to parallelize the processes
that compose the simulation. This approach should define the distribution of processes,
and their relationship across a cluster of PC’s. It also should make use of the concept
of layers, which is already used in MPhyScaS, in order to define a hierarchical parallel
structure.
1.3 Methodology
Scheduling techniquesneed toknowthe dependencies betweenprocesses. This makes im-
portant to obtain this information automatically. For this purpose, we studied MPhyScaS
data in order to identify this dependencies, which can be data dependencies or dependen-
cies inherited from the MPhyScaS architecture.
In order to analyze MPhyScaS data, we developed a Petri net model which represents
the elements and their relationship. We adopt a Coloured Petri Net (CPN), in which the
token stores the task being executed and its execution timestamps. This model permits
us to analyze all levels of the architecture. Figure 1.1 shows that we also developed a
compiler to create a CPN according to MPhyScaS data. In the same figure, one can note
that we use the CPNTools to analyze the CPN model and provide the dependencies as
the results. The analysis done in CPNTools has the aim of finding parallel processes in
each architecture level and casual dependencies between the processes. These restrictions
can inform, for example, that a process can only be executed after another one or that a
process needs some information calculated from another one.
5
MPhyScaS
Compiler
CPNTools
GA
data
CPN
dependencies
architecture
schedule
Figure 1.1: An overview of the whole process proposed.
Each object of MPhyScaS architecture represents one non-preemptive process. In
MPhyScaS, the computation time of each process and the communication cost between
processes due to the size of required data are known. Thus, we use an o line scheduling
approach. This approach groups processes that communicate large data with each other
in the same processor. This decrease the data volume through the network, decreasing the
communication cost of a simulation. In our approach we also consider a finite number of
processors.
We defined metrics to evaluate the scheduling algorithm proposed in this work. Such
metrics simultaneously evaluate three optimization criteria: i) the total execution time
(makespan); ii) the total idle time of each processor (waiting time); iii) the time spent
with communication between processes.
The complexity of the problem relies on the data dependence because the size of
some required data might be very large, increasing communication costs. Moreover, the
algorithm also takes into account the architecture of computers where the simulations
will run. Figure 1.1 shows that the algorithm chosen to schedule processes (Genetic
Algorithms - GA) uses the dependencies found in CPN analysis and also considers the
architecture of computers to find a schedule for the simulation.
1.4 Related Work
Xu
et al.
(XU; SUN; YANG, 2007) used Simulated Annealing (SA) for fixed job schedul-
ing problem. Their objective was to minimize assignment cost of the sequential network
6
model. In (KANG et al., 2008), Kang
et al.
proposed a discrete variant of Particle Swarm
Optimization (PSO) to solve job scheduling problems which all tasks are non-preemptive
and independent of each other. Thus, there is no communication cost to be worried about.
Chong
et al.
(CHONG et al., 2006) relied on nectar collection of honey bee colonies to
create an algorithm for solving scheduling problem. They consider only the makespan
function to compare its algorithm. The algorithms cited above are of the improvement
type. They start out with a complete schedule, which may be selected arbitrarily, and then
try to obtain a better schedule by manipulating the current schedule.
Among these methods, the Genetic Algorithm (GA) has emerged as a tool that is ben-
eficial for a variety of study fields. Several studies have been done to solve the scheduling
problem using GA. Kim (KIM, 2007) proposed a permutation-based elitist genetic al-
gorithm that used serial schedule generation scheme for solving a large-sized multiple
resource-constrained project scheduling problem. Kim consider the number of resources,
but do not consider waiting time and communication costs. Moattar
et al.
(MOATTAR;
RAHMANI; DERAKHSHI, 2007) proposed a GA based algorithm that finds schedules
where jobs are partitioned between processors in which total finishing time and waiting
time are minimized. As we did, they used a fitness function based on aggregation to op-
timize two criteria simultaneously, but they did not aggregate the communication cost to
their function. Yang
et al.
(YANG et al., 2001) also relies on GA for solving scheduling
problem, but they have a dierent optimization criteria: close the gap between the spec-
ification of concurrent, communicating processes and the heterogeneous processor target
without compromising required real-time performance and cost-eectiveness.
1.5 Organization
This dissertation is organized as follows. Chapter 2 contains the MPhyScaS definition
and its properties. It also presents the MPhyScaS architecture and the explanation of
each of its levels. In Chapter 3 the scheduling algorithm concepts are introduced and
some scheduling algorithms are presented. Chapter 4 introduces the concepts of Petri
nets, shows an extended Petri net called Coloured Petri Nets (CPN) and presents a model
definition based on CPN to represent MPhyScaS data which will help to identify depen-
dencies between the MPhyScaS architecture objects. Chapter 5 introduces the concepts of
7
Genetic Algorithms (GA) and presents a new GA approach that consider some new eects
in scheduling process, worried in solving MPhyScaS scheduling problem. In Chapter 6
an MPhyScaS case study is detailed which will be part of the experiments. Chapter 7
shows the experiments and the results obtained in the application of the proposed tech-
nique compared to others. Chapter 8 concludes this work and presents some possible
plans to the future.
8
Chapter 2
MPhyScaS
MPhyScaS (Multi-Physics and Multi-Scale Solver Environment) is an environment ded-
icated to the automatic development of simulators based on the Finite Element Method
(FEM (LOGAN, 2002)). The term multi-physics is a qualifier to a set of phenomena that
interact in time and space. A multi-physics system can also be called a system of coupled
phenomena. These phenomena are of dierent natures and behavior scale.
If two phenomena are coupled, it means that a part of one phenomenon’s data depends
on information from other phenomenon. Such a dependence may occur in any geometric
part, where both phenomena are defined. Other type of data dependence is the case where
two or more phenomena are defined on the same geometric component and share the
geometric mesh.
Usually, simulators based on the Finite Element Method can be organized in a lay-
ers architecture (SANTOS; BRITO; BARBOSA, 2006a). In the top layer global iterative
loops (for time stepping, model adaptation and articulation of several blocks of solution
algorithms) can be found. This corresponds to the overall scenery of the simulation. The
second layer contains what is called the solution algorithms. Each solution algorithm dic-
tates the way linear systems are built and solved. It also defines the type of all operations
involving matrices, vectors and scalars, and the moment when they have to be performed.
The third layer contains the solvers for linear systems and all the machinery for operat-
ing with matrices and vectors. This layer is the place where all global matrices, vectors
and scalars are located. The last layer is the phenomenon layer, which is responsible for
computing local matrices and vectors at the finite element level and assembling them into
global data structures.
9
The layers definition is important to the system modularity, but this does not indicate
neither how entities belonging to dierent layers interact nor what data they share or
depend upon. This is certainly very important for the definition of abstractions that can
standardize how layers interact and behave.
MPhyScaS provides a patterns language to define and represent not only the set of
entities of each layer but also the data transfer and the services between layers. Thus,
MPhyScaS can be considered as a framework that combines a number of computational
entities, which are defined based on the patterns language (SANTOS; BRITO; BAR-
BOSA, 2006b), forming a simulator. This simulator is composed of a set of reusable com-
ponents which can be changed in order to reconfigure the simulator (SANTOS; BRITO;
BARBOSA, 2006c,d), changing solution methods or other types of behavior(SANTOS;
VIEIRA; LENCASTRE, 2003). This means that MPhyScaS simulators are flexible,
adaptable and maintainable.
2.1 MPhyScaS Architecture
The MPhyScaS architecture is proposed in (LENCASTRE, 2004). This architecture es-
tablishes a computational representation for the computational layers using patterns (see
Figure 2.1). The Kernel level represents the global scenery level, the level of the solution
algorithms is represented by the Block level, the level of solvers is represented by the
Group level, and the phenomena level is represented by the Phenomenon level.
Figure 2.1: Computational representation for the layers of the simulator.
The definition of this structure is aimed at improving the quality of simulator de-
signs. The defined architecture attempts to fill in the existing gap in the development of
FEM simulators for multi-physics and multi-scales problems. The main requirements of
10
MPhyScaS architecture are:
Flexibility in the development of simulators;
Extensibility of simulators through the integration of components;
Improved reusability of processes, data and models.
MPhyScaS architecture is organized as:
The first layer: is the Kernel and it represents the overall scenery of the simulation.
It is responsible for initialization of procedures (transfered to Blocks in the lower
level), for global time loops and iterations, for global adaptive iterations, and for
articulations ofactivitiesto be executed by the Blocks in the lower level. The Kernel
stores system data related to the parameters for its loops and iterations. Therefore,
in a simulation only one Kernel can exists.
The second layer: is the Block and it represents the solution algorithms, in other
words, the way linear systems are built and solved. It is responsible for the transfer
of incoming demands from the Kernel to its Groups in the lower level (initialization
procedures, for instance); for Block local time loops and iterations (inner loops
and iterations inside a global time step, restricted to groups of Phenomena); for
procedures inside time stepping schemes; for Block local iterations (restricted to
some groups of Phenomena); for Block local adaptive iterations (restricted to some
groups of Phenomena); for operations with global quantities (transfered to Groups
in the lower level, which are the owners of global quantities). The Blocks serve the
Kernel level. Each Block is responsible for a certain number of Groups, which can
not be owned by other Block. All demands for a Block to the lower level should
be addressed to its Groups. The Blocks store system data related to parameters for
their own loops and iterations and parameters for their procedures.
The third layer: is the Group and it represents the solvers of linear systems. It
is responsible for the transfer of incoming demands from its Block to Phenomena
in the lower level (initialization procedures, for instance); for the assembling co-
ordination and solution of systems of linear algebraic equations (the method used
depends on the solver component); for operations with global quantities (by de-
mand from its Block); for articulation of activities to be executed by its Phenomena
11
in the lower level (basically concerned with computation and assembling of global
matrices and vectors). The Groups serve their respective Blocks. Each Group is
responsible for a certain number of Phenomena, which can not be owned by other
Group. All demands from a Group to the lower level should be addressed to its
Phenomena only. The Groups store global matrices, vectors and scalars and store
the GroupTasks, which are objects encapsulating standard procedures, where artic-
ulation of the Group’s Phenomena are needed. The GroupTasks are programmable
and their data are standard pieces of information, depending only on the type of the
GroupTask.
The last layer: is the Phenomenon. It is responsible for the computation of lo-
cal matrices, vectors and scalars (Phenomenon quantities); for operations involv-
ing matrices and vectors at the finite element level and their assembling into given
global matrices and vectors. The Phenomena serve their respective Groups. The
Phenomena store data related to constitutive parameters or other parameters, which
are specific of the respective Phenomenon; store the geometry where the respective
Phenomenon is defined (dierent Phenomena may share a geometry or a part of it);
store WeakForms, which are tools for computing and assembling quantities defined
on a certain part of the geometry. A WeakForm may be active or not. Only active
WeakForms can be used during a simulation. A WeakForm may store parameters,
which are related to specific simulation data (for instance, functions for the defi-
nition of the boundary conditions or parameters needed for the computation of a
quantity, which should be given together within a simulation data set). The Phe-
nomenon should store methods, which are tools to be used in certain Phenomenon
specific tasks. For instance, those tasks can be generation of geometric and Phe-
nomenon meshes, numerical integration at the element level, shape functions, etc.
The simulation starts with the execution of the root of the Kernel which uses services
provided by the Blocks. Each Block in turn uses services provided by the Groups. Each
Group owns a set of Phenomena which are used to perform the production of local struc-
tures (matrices, vectors, and scalars) and assembling them into a given (by the Group)
global one.
Figure 2.2 shows that the Kernel can articulate several Blocks; each Block can articu-
late several Groups; each Group can articulate several Phenomena; and each Phenomenon
12
is capable of computing and assembling a set of local Quantities (matrices, vectors and
scalars).
Figure 2.2: Each Phenomenon is able of computing a set of quantities.
The States that configure the Phenomenon are stored in the respective Group, where
solvers for linear systems are located. This is convenient, due to the fact that the Group is
responsible not only for assembling and solving algebraic systems, but also for operating
with the structures when requested by its Block (see Figure 2.3).
Figure 2.3: Each Phenomenon has its own set of States, which is stored in its Group.
A quantity that a Phenomenon can compute and assemble can be coupled with another
13
Phenomenon’s states (one or more) as it is depicted in Figure 2.4. MPhyScaS provides
all the machinery to make this procedure automatic, following the specification of some
data related to the place where coupling occurs. These data should be given to the object
responsible for the computation.
Figure 2.4: A quantity computed by a Phenomenon may be coupled to a State from other
Phenomenon.
2.2 MPhyScaS Parallel Architecture
The original architecture of MPhyScaS provides support to the automatic building of
sequential simulators only. For instance, it does not have abstractions that could automat-
ically define the distributionof data and procedures and their relationships across a cluster
of PC’s. The MPhyScaS parallel architecture (called MPhyScaS-P) satisfy a number of
new requirements, including the support of parallel execution of the simulators in clusters
of PC’s.
MPhyScaS-S (sequential) is a framework with the support of an extended finite ele-
ment library and a knowledge management system. MPhyScaS-P uses the same extended
library and knowledge management system from MPhyScaS-S (with minor dierences).
It also makes use of the concept of layers already used in MPhyScaS-S in order to define
a hierarchical parallel structure.
14
At least four hierarchical levels of dierent procedures and memory usage can be
found in modern clusters of PC’s (SANTOS et al., 2008):
Cluster Level: it is composed of all processes running in all machines being used
in a simulation;
Machine Level: it is composed of all processes running in one individual machine
among all those used in a simulation;
Processor Level: it is composed of all processes running in one individual proces-
sor among all those running in one individual machine;
Process Level: it is composed of one single process running in one individual
processor among all other processes in this same processor. It can be divided into
two groups:
Core Sub-Level: it is composed of all parts of the code from one individual
process, which is not strongly hardware specific;
Software-Hardware (SH) Sub-Level: it is composed of all parts of the code
from on individual process, which is strongly hardware specific (cache man-
agement, fpga acceleration).
Whenever the architecture of a computational system allows for a hierarchy of pro-
cedures, it may be a good idea to define a hierarchy of processes in such a way that few
of them would accumulate some very light management tasks. The benefits for this strat-
egy include (SANTOS et al., 2008): (i) procedures can be hierarchically synchronized
(from coarse to fine grain), reducing management concerns and increasing correctness;
(ii) since locality concerns change along the hierarchy levels, memory management can
become more and more specialized from top to bottom; (iii) the hierarchy allows for the
encapsulation of services and procedures, making it easier the exchange of components.
Besides the natural benefit of this aspect, it also allows for the adaptation of the code to
new hardware and software technologies, without incurring in heavy reprogramming in
all levels of the hierarchy.
The MPhyScaS-S architecture has satisfactorily solved the main problems related to
data dependence and sharing between phenomena for the sequential processing. It is also
15
been able of representing solution algorithms in such a way that the entire simulator can
be reconfigured with a minimum amount of reprogramming (maximum amount of reuse).
However, an essential dierence, when considering parallel processing, is the need for
interprocess communication, which can be of four types (SANTOS et al., 2008):
1. Communication during linear algebra operations: Interprocess communication
is needed during several linear algebra operations. For instance, parallel matrix-
vector multiplication requires interprocess communication in order to complement
the job done by each process;
2. Communication along process hierarchy: The establishment of the hierarchy of
procedures will require some processes to assume some kind of leadership, depend-
ing on the layer where they are located. This will require interprocess communica-
tion throughout the hierarchy;
3. Communication for coupled information - I: MPhyScaS-S has dealt with cou-
pling dependencies between dierent phenomena already, but in parallel process-
ing this type of dependence can become very complex. For instance, this is the case
whenever the interface between two coupled phenomena coincides with a boundary
between two components of the mesh partition. That means that vector fields and
respective mesh data have to be transferred from one process to the other;
4. Communication for coupled information - II: If two coupled phenomena have
dierent geometric meshes, all components in one mesh partition may be dierent
from all components in the other partition. Thus, there will be a need for interpro-
cess data transfer from one process to the other, whenever coupled quantities have
to be computed.
Well-thought distribution schemes and data representation abstractions can eliminate
both types of communication for coupled information. This can be done by copying the
coupled vector fields and mesh data, to the processes where they are needed. Those copies
will be updated whenever needed (communication of type 1 only). Processes will have to
be given a larger memory space, but that can be made a minimum. The important thing is
that the whole data set of a coupled mesh will not be transferred between two processes
when a coupled quantity is to be computed. Thus, only types 1 and 2 will be needed.
They will be called communication of Type-I and Type-II, respectively.
16
In order to simplify the presentation of PMhyScaS-P’s architecture some requirements
have to be made (SANTOS et al., 2008): (i) if two or more phenomena are coupled in one
geometric entity, then they share the same geometric mesh on that geometric entity; (ii) all
phenomena should be represented in each process with a nonempty geometric mesh; (iii)
only three hierarchical levels will be considered: Cluster, Machine, and Process levels.
There are two views of the MPhyScaS-P’s architecture (SANTOS et al., 2008):
Logical View: The logic of MPhyScaS-P’s workflow is the same of MPhyScaS-S’,
that is, it has the same levels of procedures (Kernel, Blocks, Groups and Phenom-
ena), all relationships between them are preserved and all procedures within each
layer are technically the same (besides the fact that data are nowdistributed). There-
fore the relationships among entities in the dierent levels of MPhyScaS-P is also
a DAG (Directed Acyclic Graph). Thus, we are able of cloning a suitable modifi-
cation of MPhyScaS-S to all processes in a SPMD (Single Process, Multiple Data)
scheme. In this sense, one can imagine that MPhyScaS-P is MPhyScaS-S with
distributed data and a hierarchical synchronization scheme (see Topological View
below);
Topological View: The topology of the procedures in the workflow of MPhyScaS-
S is implemented in MPhyScaS-P in a hierarchical form with the aid of a set of
processes, which are responsible for the procedures synchronization. There are
three types of leader processes (see Figure 2.5):
Cluster Rank Process: It is responsible for the execution of the Kernel and to
synchronize the beginning and the end of each one of its level’s tasks, which
requires demands to the lower level process. In a simulation there is only one
ClusterRank process (for instance, the process with rank equal to zero in an
MPI based system). Figure 2.6 depicts the relationship between a ClusterRank
process and the simulator layers;
Machine Rank Process: One process is chosen among all processes running
in an individualmachine to be its leader. Thus, thereis only one MachineRank
process per machine. It is responsible for the execution of procedures in the
Block level and to synchronize the beginning and the end of each one of it’s
level’s tasks, which requires demands to lower level processes. ClusterRank is
17
also the MachineRank in its own machine. Figure 2.7 depicts the relationship
between a MachineRank process and the simulator layers;
Process Rank Process: It is responsible for the execution of the procedures
in the Group level. The ClusterRank and all MachineRank processes are also
ProcessRank processes. Figure 2.8 depicts the relationship between a Pro-
cessRank process and the simulator layers.
ProcessRank
MachineRank
ClusterRank
Processor Processor Processor
Machine
Processor Processor
Machine
Cores
000
000
000 001 010 011 020 021 022
100
100 101 110
111
Figure 2.5: Hierarchy of the simulator in MPhyScaS-P.
Processor Processor Processor
Machine
Processor Processor
Machine
000
000
000 001 010 011 020 021 022
100
100 101 110
111
Process000executes
proceduresinalllayers
fromKerneldownwards
Phs
Groupk
Groupj
Kernel
BlockI
Solver
Solver
Phr Phm Phn
Figure 2.6: Layers with procedures executed by ClusterRank in MPhyScaS-P.
18
ProcessRank
MachineRank
Processor Processor
Machine
100
100 101 110
111
Process100executes
proceduresinalllayers
fromBlockdownwards
Groupk
Groupj
KernelAmbassador
BlockI
Phs
Solver
Phr
Solver
Phm Phn
Figure 2.7: Layers with procedures executed by MachineRank in MPhyScaS-P.
Processor Processor Processor
Machine
Processor Processor
Machine
000 001 010 011 020 021 022 100 101 110
111
Remainingprocessesexecute
proceduresinalllayersfrom
Groupdownwards.
Groupk
Groupj
BlockAmbassador
Phs
Solver
Phr
Solver
Phm Phn
ProcessRank
Figure 2.8: Layers with procedures executed by ProcessRank in MPhyScaS-P.
Knowing that MPhyScaS-S transfer commands from the Kernel level down to the
Phenomenon level in the form of a tree structure, it can be seen that a ClusterRank only
demands services from all MachineRanks, which only demands services from all of its
ProcessRanks. Since the activities in one level returns to the level immediately above
after they are finished (with the exception of the Kernel level) there are natural ways
of synchronizing each activity. Furthermore, there is certainly an advantage with the
tremendous simplification in the synchronization tasks. Note that the activities in Group
and Phenomenon levels are left for a finer granularity of management. In both levels there
are well localized CPU intensive operations, which could be accelerated with a suitable
software optimization and the use of hardware devices
19
Chapter 3
Scheduling Algorithms
Scheduling is a decision-marking process that is used on a regular basis in many manu-
facturing and services industries. It deals with the allocation of resources to tasks over a
given time periods and its goal is to optimize one or more objectives (PINEDO, 2008).
Definition 3.0.1 A task system is a weighted directed acyclic graph G = (V, E, ω) where:
the finite set V of vertices represents the tasks;
the set E of edges represents precedence constraints between tasks: e = (u, v) E
if and only if u v, meaning that task u preceeds task v;
the weight function ω : V N
gives the weight (or duration) of each task. Task
weights are assumed to be positive integers.
A schedule σ of a task system is a function that assigns a beginning time and a target
processor to each task. A schedule must preserve the dependence constraints induced
by the precedence relation and materialized by the edges of the dependence graph; if
u v, then the execution of u begins at time σ(u) and requires ω(u) units of time, and
the beginning of the execution of v at time step σ(v) must be posterior to the end of the
execution of u. Assume the following formal definition (DARTE; ROBERT; VIVIEN,
2000).
Definition 3.0.2 A schedule of a task system G = (V, E, ω) is a function σ : V N
such
that σ(u) + ω(u) σ(v), whenever e = (u, v) E.
20
Often there are other constraints that must be met by schedules, called resource con-
straints. When there is only a fixed number p of available processors, we say that we
have a problem with limited resources Pb(p). The resource constraints are expressed as
follows; each task v V is allocated to a processor alloc(v) using an allocation function
alloc : V P, where P = 1, ..., p denotes the set of available processors. The relationship
between the schedule σ and the allocation alloc is that no processor can be allocated to
more than one task at the same time step. This translates into the following condition:
T T
alloc(T) = alloc(T
)
σ(T) + ω(T) σ(T
)
or σ(T
) + ω(T
) σ(T)
(3.1)
This condition express the fact that if two tasks T and T
are allocated to the same
processor, then the execution of T must be completed before the execution of T
can
begin, or vice versa.
3.1 Classification of Scheduling Problems
Scheduling problems can be classified in terms of three fields (BRUCKER, 2006):
α, that specifies the machine environment;
β, that specifies the job characteristics;
γ, that denotes the optimality criterion.
3.1.1 Machine Environment
The machine environment is characterized by a string with two parameters α
1
and α
2
,
represented as α = α
1
α
2
. The first parameter α
1
can assume the following values:
α
1
{◦, P, Q, R, PMPM, QMPM, G, J, F, O, X}. The second parameter α
2
should assume
a positive integer value to denote the number of machines available. If α
2
is an arbitrary
but fixed number of machines, α
2
must be set to .
If α
1
is set to , this means that there are no parallel machines. With this parameter,
all jobs must be processed in a single machine.
21
If α
1
is set to {P, Q, R}, it represents an environment with parallel machines. For
identical parallel machines, α
1
must be set to P. In this case, the processing time of a job
is the same for all machines (see Definition 3.1.1).
Definition 3.1.1 Let J = { j
0
, j
1
, .. . , j
n
} be the set of jobs, M = {m
0
, m
1
, .. . , m
x
} be the set
of machines, m
k
M, ω(j
i
, m
k
) = p
i
Q represents the uniform parallel machines, that is, machines that run in dierent but
related speeds. Thus, the processing time of a job depends on the machine speed, as
showed in Definition 3.1.2. Finally, if α
1
is set to R, then there are unrelated parallel
machines. In this case, we can not define the processing time of a job.
Definition 3.1.2 Let J = { j
0
, j
1
, .. . , j
n
} be the set of jobs, M = {m
0
, m
1
, .. . , m
x
} be the
set of machines, S = {s
0
, s
1
, .. . , s
x
} be the set of speeds of machines in M, m
k
M,
ω(j
i
, m
k
) =
p
i
s
k
When the machine environment contains multi-purpose machines, they can be speci-
fied as PMPM or QMPM. In the first case, there are multi-purpose machines with iden-
tical speeds. In the second case, there are multi-purpose machines with uniform speeds.
In the other cases (α
1
{G, J, F, O, X}), we have a multi-operation model, i.e., as-
sociated with each job J
i
there is a set of operations O
i1
, O
i2
, .. . , O
in
. The machines are
dedicated. Furthermore, there are precedence relations between arbitrary operations. This
general model is called a general shop. We indicate the general shop by setting α
1
= G.
The symbols J, F, O, X designate special cases of general shop.
The job shop scheduling problem, represented by α
1
= J, is briefly described as
follows (TSUTSUMI; FUJIMOTO, 2005). There is a set of jobs and a set of machines.
Each job consists of a sequence of operations which have precedence constraints of the
form O
ij
O
i, j+1
and must be processed in order. Each of them needs fixed duration
on one of the machines. Each machine can process at most one operation at one time,
each operation is not interrupted if once it has started. Figure 3.1 shows an example of
a job shop scheduling problem with three jobs (each of them has four operations) being
executed in four machines.
When α
1
= F, then it indicates that it is a flow shop scheduling problem. In this
scheduling problem, there is a set of machines and each operation O
ij
of a job i must be
22
O
11
O
12
O
13
O
14
O
21
O
31
O
22
O
32
O
23
O
33
O
24
O
34
m
4
m
1
m
2
m
3
Figure 3.1: Example of a job shop scheduling problem.
processed in the machine m
j
. Here, the operations of a job have the same precedence
constraints of the job shop scheduling problem. Figure 3.2 shows an example of a flow
shop scheduling problem. In this case, the number of operations of each job should be
equal to the number of machines (WANG et al., 2005).
O
11
O
12
O
13
O
14
O
21
O
31
O
22
O
32
O
23
O
33
O
24
O
34
m
4
m
1
m
2
m
3
Figure 3.2: Example of a flow shop scheduling problem.
The open shop scheduling problem denoted by O is similar to the flow shop schedul-
ing problem. The main dierence between them is that in the flow shop problem the
operations for each job need to be processed in order, i.e., one may not begin processing
operation O
ij
until operation O
i, j1
has been completed (LEUNG; KELLY; ANDERSON,
2004). In the open shop problem the order in which operations are processed is irrelevant.
Figure 3.3 illustrates an open shop scheduling problem.
O
11
O
12
O
13
O
14
O
21
O
31
O
22
O
32
O
23
O
33
O
24
O
34
m
4
m
1
m
2
m
3
Figure 3.3: Example of a open shop scheduling problem.
The mixed shop scheduling problem is denoted by the symbol X. The mixed shop
problem is a combinationof thejob shopproblem and the openshop problem (BRUCKER,
23
2006). Thus, we have open shop jobs and job shop jobs. The number of operations of job
J
i
is denoted by n
i
.
3.1.2 Job Characteristics
The job characteristics are specified by a set β containing at the most six elements β
1
, β
2
,
β
3
, β
4
, β
5
, and β
6
(BRUCKER, 2006).
β
1
indicates whether preemption is allowed. Preemption is the act of temporarily
interrupting a job with the intention of resuming the job at a later time. In other words,
the preempted job is resumed and put to execution from the point of preemption without
loss of the prior work (BOBBIO, 1994). If preemption is allowed, we set β
1
= pmtn.
Otherwise, β
1
does not appear in β.
β
2
describes the precedence relations between jobs. A precedence relation is a binary
relation between jobs, that is, “Job J
i
precedes job J
j
means that processing of job J
j
cannot be started until completion of job J
i
. A set of precedence relation can be repre-
sented by a Directed Acyclic Graph (DAG) G. If G is an arbitrary DAG, we set β
2
= prec.
Other ways to represent it can be given by chains, an intree, an outtree, a tree, or a series-
parallel directed graph. In these cases, we set β
2
equal to chains, intree, outtree, tree, and
sp-graph respectively. If there are no precedence relations, β
2
does not appear in β.
If β
3
= r
i
, then release dates may be specified for each job. The release date is the
time which before it the job cannot start. This means that each job starts after its release
date. If r
i
= 0 for all jobs, then β
3
does not appear in β.
β
4
specifies restrictions on the processing times. If β
4
is equal to p
i
= 1 for all jobs,
then each job has a unit processing requirement, which means that each job requires only
a unit of time to be processed. Similarly, β
4
can be easily interpreted when set to p
i
= p,
which means that each job has processing time equals to p, or p
i
{1, 2}, which means
that each job has processing time in the interval.
The deadline of a job is specified in β
5
. For each job J
i
must be specified a deadline d
i
.
The deadline represents the later time that a job must finish its execution. In other words,
the job must finish not later than time d
i
.
Some scheduling algorithms consider sets of jobs grouped into batches. A batch is a
set of jobs which must be processed jointly on a machine (KASHAN; KARIMI; JENABI,
2008). A batching problem is to group jobs into batches and schedule them. There are
24
two types of batching problems: p-batching problem, and s-batching problem. In the
p-batching problem type, the length of the batch is equal to the maximum of processing
times of all jobs of the batch. In the s-batching problem, the length of the batch is equal
to the sum of processing times of all jobs of the batch. If the scheduling problem consider
batches, then β
6
must be set to β
6
= p batch or β
6
= s batch. Otherwise, β
6
does not
appear in β.
3.1.3 Optimality Criterion
The optimization criteria specifies what the problem needs to be optimized and how this
optimization should occur. It must be represented as a mathematical function involving
all properties of the problem in order to measure the quality of the solution (BARESEL;
STHAMER; SCHMIDT, 2002). This function should guide the process of searching the
best value to the problem (BARESEL et al., 2004), which is the optimal value of the
function. The optimization criteria can be the minimization or the maximization of this
function, called objective function.
In scheduling problems, there are some objective functions that are known as the most
common ones: makespan (max{C
i
|i = 1, ..., n}),total flow time (
n
i=1
C
i
), and weighted total
flow time (
n
i=1
ω
i
C
i
) (BRUCKER, 2006), where C
i
is the finishing time of job J
i
. In this
cases we write γ = C
max
, γ =
C
i
, and γ =
ω
i
C
i
, respectively.
3.2 Compared Scheduling Algorithms
In scheduling problems, n jobs with processing times p
1
, p
2
, .. . , p
n
have to be distributed
among m identical machines so as to minimize the time span of the resulting schedule.
If the completion time of the jth job is denoted by C
j
, then this criterion amounts to
the minimization of C
max
= max
j=1,...,n
{C
j
}, also called makespan. Note that a solution
corresponds to an assignment of jobs to machines, and the exact order in which the jobs
are processed on the machines is important due to the dependencies between jobs. Fur-
thermore, we will consider identical machines, so we do not distinguish between them.
Therefore, an assignment of jobs to machines is actually a partition of the jobs into m
subsets.
25
Some scheduling algorithms are presented in this section: list scheduling; SPT (Short-
est Processing Time); and LPT (Longest Processing Time). They will be compared to the
algorithm presented in Chapter 5. This comparison will be done in terms of makespan,
the communication cost, and the waiting time of each processor. One can note that the
algorithms presented in this section only consider the first criterion.
3.2.1 List Scheduling
One of the most often used general approximation strategies for solving scheduling prob-
lems is list scheduling (BLAZEWICZ et al., 2007). The basic idea of the list scheduling
is to make a scheduling list (a sequence of nodes for scheduling) by assigning them some
priorities. A node is stored in a priority ordered list as soon as it is ready. Nodes that have
all immediate predecessors executed are referred to as being ready.
In a traditional scheduling algorithm, the scheduling list is statically constructed be-
fore node allocation begins, and most importantly, the sequencing in the list is not mod-
ified (SIMION et al., 2007). Depending on the way the priority list is constructed, the
schedules produced by such heuristic may be quite poor or quite good. Thus, this case
of heuristic provides a natural vehicle for the analysis of the relation between solution
quality and robustness (KOLEN et al., 1994).
When tasks are scheduled to resources, there may be some holes between schedule
tasks due to dependencies between tasks of the application. When a task is scheduled
to the first available hole, this is called the insertion approach. If the task can only be
scheduled after the last task scheduled on the resource without considering holes, it is
called a non-insertion approach. The insertion approach performs much better than the
non-insertion one, because it utilizes idle times better (SIMION et al., 2007).
Then, the list scheduling algorithm executes the steps in the following pseudo-code.
The first and second lines initialize a list of ready jobs and sort them into priority order.
In this stage, ready jobs are the ones that do not have any dependence even before the
beginning of the schedule algorithm. In lines 3 to 5, s
k
is initialized, where it represents
the sum of execution cost of processor p
k
. Then, the algorithm repeatedly assign the first
job of ready jobs list to the first free processor (lines 7 to 9) until all jobs are assigned to
any processor. In the same loop, the algorithm also update the s
k
value of the processor
that received the job. The s
k
value is added by the execution cost of the job (C
ji
, line 10).
26
Before executing the next iteration of the loop, the algorithm update the ready jobs list by
removing the assigned jobs and adding the jobs that became ready at the correct position
in the list (sorted by the priority). At the end of the algorithm is possible to know the
makespan value which is the maximum value of s
k
(line 13).
Pseudo-Code 1: List scheduling algorithm pseudo-code.
Initialize Ready Jobs1
Sort Ready Jobs By Priority2
foreach processor p
k=1,...,m
do3
s
k
= 04
end5
while ready Jobs do6
j
i
= Get First Job Of Ready Jobs7
p
k
= Get First Processor Free8
Assign Job To Processor( j
i
, p
k
)9
s
k
= s
k
+ C
ji
10
Update Ready Jobs11
end12
makespan = max{s
k=1,...,m
}13
As most heuristics, list scheduling assumes fully connected homogeneous processors
and ignores contention on the communication (SINNEN; SOUSA, 2001). Overall, list
scheduling algorithms have been widely used in synthesis systems (MICHELI, 1993).
Two well-known examples of permutation list scheduling rules are the SPT (Short-
est Processing Time) and the LPT (Longest Processing Time) rules. The quality of the
solution produced by these rules are very dierent (KOLEN et al., 1994).
3.2.2 Longest Processing Time - LPT
One of the simplest algorithms is the LPT algorithm (BLAZEWICZ et al., 2007) in which
the tasks are arranged in order of decreasing processing times. The LPT rule tries to place
shorter jobs towards the end of the schedule, where they can be used to balance the load.
In case of identical, parallel machines, it has been shown that the makespan of the sched-
ule constructed with this rule is less than
4
3
of the optimal makespan (ALTENDORFER;
27
KABELKA; STOCHER, 2007). It is hence also used for maximizing machine utilization.
To demonstrate the number of possible assignments that can be generated by a given
LPT rule when uncertainty exists with respect to one of the processing times, we will
conduct an experiment carrying out a parametric analysis with respect to p
1
. This means
that we will study the LPT rule when p
1
= λ, where λ gradually increases from zero to
infinity. Besides the number of solutions generated by the LPT rule, it is also natural to
study the solution value C
max
as a function of λ (KOLEN et al., 1994).
We will present an example with two machines and four jobs having processing times
p
1
= λ, p
2
= 1, p
3
= 2, and p
4
= 4, that are scheduled according to the LPT rule.
The dierent schedules and corresponding linear parts of makespan (C
max
) are given in
Figure 3.4.
Let us now look at an arbitrary permutation list scheduling heuristic applied to an
arbitrary problem instance. When λ is increased from zero to infinity the ordering of the
jobs according to non-decreasing processing times will change. Hence the list will change
too, although the relative order of the jobs 2, 3, ..., n will remain the same. If λ is equal
to one of the processing times p
2
, .. . , p
n
, then we may assume that in the order according
to non-decreasing processing time job 1 follows directly after the job with largest index
for which the processing time equals λ. Thus, a further increasing of λ leaves the current
ordering unchanged until λ becomes equal to the next larger processing time. It is easily
verified that C
max
is continuous in λ = p
j
(j = 2, ..., n) (KOLEN et al., 1994).
3.2.3 Shortest Processing Time - SPT
According to the SPT rule, jobs appear on the list in order of increasing processing times.
For n jobs with processing times p
1
= λ < p
2
... p
n
and j = 1, 2, ..., p, we may
assume that job that has processing time p
j
is scheduled in processor m
j
. Since m
1
is the
first processor which becomes available again, the job that has processing time p
j+1
is
scheduled on m
1
. Because p
1
+ p
j+1
p
j
, processor m
1
now is called as the maximum
processor and the job that has processing time p
j+2
is scheduled on m
2
. Since p
2
p
1
and p
j+2
p
j+1
processor m
2
has now become the maximum processor and job that
has processing time p
j+3
is scheduled on m
3
. Continuing this way we find that an SPT-
schedule can always be assumed to have the following structure: job j (the one that has
processing time p
j
) is scheduled in processor m
i
where i is given by i = [(j1) mod p]+1,
28
where j = 1, ..., n, and p is the number of available processors. The maximum processor
is the one processing job n.
The SPT rule has its strength with respect to utilization maximization especially if
some jobs take considerably longer than average. It avoids clogging a very busy ma-
chine with such a long-duration job by giving preference to jobs with shorter processing
times, reserving the long jobs for inflexible machines (ALTENDORFER; KABELKA;
STOCHER, 2007). When the objective is the sum of completion times, the minimum
value is achieved by sequencing the jobs in order of Shortest Processing Time. No mat-
ter what deadline is specified, SPT sequencing maximizes the number of jobs completed
by that time; choosing the small tasks obviously works better than choosing the large
tasks, if your objective is to complete as many as possible. It is not hard to see that
SPT is also optimal for the sum of waiting times (and the average of waiting times) as
well (BLAZEWICZ et al., 2007).
The dierent schedules and corresponding linear parts of makespan (C
max
) of the same
example presented in previous section is presented in Figure 3.5. These schedules are
given according to the SPT rule. Here, λ also increases from zero to infinity, where
p
1
= λ, p
2
= 1, p
3
= 2 and p
4
= 4.
Then, it is possible to say that SPT has mainly two properties (LEUNG; KELLY;
ANDERSON, 2004):
1. When the objective is the sum of completion times, the minimum value is
achieved by sequencing the jobs in the order of Shortest Processing Time -
This property articulates the virtue of shortest-first sequencing. For any set of n
jobs, shortest-first priority leads to a sequence that minimizes the value of
p.
Thus, whenever the quality of a sequence is measured by the sum of completion
times, we can be sure that the appropriate sequencing rule is shortest first.
2. When the objective is the sum of weighted completion times, the minimum
value is achieved by sequencing the jobs in the order of Shortest Weighted
Processing Time - This reveals that to reconcile the conflict between processing
time and importance weight, we should use the ratio of time to weight. Viewed an-
other way, the implication is that we should first scale the processing times (divide
each one by its corresponding weight) and then apply the shortest-first principle to
the scaled processing times.
29
Recall that a sequencing problem is defined by a set of jobs and their processing times,
along with an objective. The choice of objective is a key step in specifying the problem,
and the use of the sum of completion times may sound specialized. However, SPT is
optimal for other measures as well.
First, note that Property 1 applies to the objective of the average completion time.
The average would be computed by dividing the sum of completion times by the (given)
number of jobs. In general, that would mean dividing by the constant n. Therefore,
whether we believe that the sum of completion times or the average of the completion
times more accurately captures the measurement of schedule eectiveness, we can say
that SPT is optimal(CHAJED; LOWE, 2008).
Second, consider the objective of completing as many jobs as possible by a pre-
specified deadline. No matter what deadline is specified, SPT sequencing maximizes
the number of jobs completed by that time. This might not be surprising to anyone who
has tried to squeeze as many dierent tasks as possible into a finite amount of time (LE-
UNG; KELLY; ANDERSON, 2004). Choosing the small tasks obviously works better
than choosing the large tasks, if our objective is to complete as many as possible. Taking
this observation to its limit, it follows that a shortest-first sequence maximizes the number
of tasks completed.
Finally, consider the completion time as a measure of the delay experienced by an
individual customer. We could argue that the total delay has two parts: the wait for work
to start and the time of the actual work itself. The processing time is a function of the
order requested by the customer and is therefore likely to be under the customer’s control.
The wait until the work starts depends on the sequence chosen. Thus, we might prefer to
take as an objective the sum of the wait times. Although this seems at first glance to be
a dierent problem, it is not hard to see that the SPT is optimal for the sum of the wait
times (and the average of the wait times) as well (KOLEN et al., 1994).
The dual benefits of SPT sequencing are fortuitous if we think in terms of the criteria
that often guide managers of manufacturing or service systems as they strive to satisfy
a given set of customers order. One type of concern is the response time that customers
experience. Customers would like delays in satisfying their orders to be short. If we
quantify this concern with the average completion time objective, an optimal sequencing
strategy is shortest-first. Another type of concern is the pile of unfinished orders in in-
30
ventory, which managers would like to keep small. If we quantify this concern with the
average inventory objective, then shortest-first remains an optimal strategy. In the context
of our simple sequencing problem, at least, the delay criterion and the inventory criterion
may sound quite dierent, but they are actually dierent sides of the same coin: both are
optimized by SPT sequencing. Thus, whether a manager focuses on customer delays or
on unfinished work in progress, or tries to focus on both at once, shortest-first sequencing
is a good idea (KOLEN et al., 1994).
To summarize the three presented scheduling algorithms, Table 3.1 shows the objec-
tives presented in this chapter and which algorithm can be used for each objective. Note
that the List Scheduling can be used for almost all objectives, this happens because this
algorithm uses a priority list to order the jobs. Depending on the way the priority list is
constructed, the schedules produced by such heuristic may be quite good for any of the
objectives (KOLEN et al., 1994). The GA algorithm in Table 3.1 shows what objectives
can be optimized using GA. Our proposed GA do not optimize all of them.
Table 3.1: A summary of which algorithm can be used for each objective.
Objectives List Scheduling LPT SPT GA
makespan (C
max
) X X X
completion time X X
waiting time X X X
communication cost X
load balance X X X
maximize machine utilization X X X
minimize the total delay X X X
31
C4
C3
C2 C1
p
0
p
1
0 < 1λ
makespan = 4
1 < 2λ
makespan = 3 + λ
C4
C3
C2
p
0
p
1
C1
2 < 3λ
makespan = 5
C4
C3
C2
p
0
p
1
C1
3 < 4λ
makespan = 2 + λ
C4
C3
C2
p
0
p
1
C1
4 < 5λ
makespan = 6
C4
C3
C2
p
0
p
1
C1
5 < 6λ
makespan = 1 + λ
C4
C3
C2
p
0
p
1
C1
6 < 7λ
makespan = 7
C1
C4
C3
C2
p
0
p
1
λ 7
makespan = λ
C1
C4
C3
C2
p
0
p
1
Figure 3.4: Example of LPT schedules.
32
C4
C3
C2
C1
p
0
p
1
0 < 1λ
makespan = 5
1 < 2λ
makespan = 4 + λ
C4
C3
C2
p
0
p
1
C1
2 < 3λ
makespan = 6
C4
C3
C2
p
0
p
1
C1
3 < 4λ
makespan = 6
C4
C3
C2
p
0
p
1
C1
λ 4
makespan = 2 + λ
C4
C3
C2
p
0
p
1
C1
Figure 3.5: Example of SPT schedules.
33
Chapter 4
Identifying Job Dependencies
This chapter introduces the concepts of Petri nets, shows an extended Petri net called
Coloured Petri Nets (CPN) and presents a model definition based on CPN to represent the
elements of MPhyScaS data and their relationship. This model will permit us to analyze
all levels of the architecture identifying parallel processes in each architecture level and
casual dependencies between the processes.
4.1 Petri Nets
Petri nets are an excellent formal model for studying concurrent and distributed sys-
tems and have been widely applied in many dierent areas of computer science and
other disciplines (MURATA, 1989). Since Carl Adam Petri originally developed Petri
nets in 1962 (PETRI, 1962), Petri nets have evolved through four generations: the first-
generation low-level Petri nets primarily used for modeling system control (REISIG,
1985), the second-generation high-level Petri nets for describing both system data and
control (JENSEN; ROZENBERG, 1991), the third-generation hierarchical Petri nets for
abstracting systemstructures (HE;LEE, 1991), and the fourth-generation objected-oriented
Petri nets for supporting modern system development approaches (AGHA; DE CINDIO;
ROZENBERG, 2001).
A Petri net is a particular kind of directed graph, together with an initial state called
the initial marking, M
0
. The underlying graph N of a Petri net is a directed, weighted,
bipartite graph consisting of two kinds of nodes, called places and transitions, where
arcs are either from a place to a transition or from a transition to a place. In graphical
34
representation, places are drawn as circles, transitions as bars or boxes. Arcs are labeled
with their weights (positive integers), where a k-weighted arc can be interpreted as the set
of k parallel arcs. Labels for unity weight are usually omitted. A marking (state) assigns
to each place a nonnegative integer. If a marking assigns to place p a nonnegative integer
k, we say that p is marked with k tokens. Pictorially, we place k black dots (tokens) in
place p. A marking is denoted by M, an m-vector, where m is the total number of places.
The pth component of M, denoted by M(p), is the number of tokens in place p.
In modeling, using the concept of conditions and events, places represent conditions,
and transitions represent events. A transition (an event) has a certain number of input
and output places representing the pre-conditions and post-conditions of the event, re-
spectively. The presence of a token in a place is interpreted as holding the truth of the
condition associated with the place. In another interpretation, k tokens are put in a place
to indicate that k data items or resources are available. The formal definition of a Petri net
is given in Definition 4.1.1 (MURATA, 1989).
Definition 4.1.1 (Petri net formal definition) A Petrinetis a 5-tuple, PN = (P, T, F, W, M
0
)
where:
P = {p
1
, p
2
, .. . , p
m
} is a finite set of places,
T = {t
1
, t
2
, .. . , t
n
} is a finite set of transitions,
F (P × T) (T × P) is a set of arcs (flow relation),
W : F {1, 2, 3, ...} is a weight function,
M
0
: P {0, 1, 2, 3, ...} is the initial marking,
P T = and P T .
A Petri net structure N = (P, T, F, W) without any specific initial marking is denoted
by N.
A Petri net with the given initial marking is denoted by (N, M
0
).
The behavior of many systems can be described in terms of system states and their
changes. In order to simulate the dynamic behavior of a system, a state or marking in a
Petri net is changed according to the following transition (firing) rule:
1. A transition t is said to be enabled if each input place p of t is marked with at least
w(p, t) tokens, where w(p, t) is the weight of the arc from p to t.
35
2. An enabled transition may or may not fire (depending on whether or not the event
actually takes place).
3. A firing of an enabled transition t removes w(p, t) tokens from each input place p
of t, and adds w(t, p) tokens to each output place p of t, where w(t, p) is the weight
of the arc from t to p.
A transition without any input place is called a source transition, and one without any
output place is called a sink transition. Note that a source transition is unconditionally
enabled, and that the firing of a sink transition consumes tokens, but does not produce
any.
A pair of a place p and a transition t is called a self-loop if p is both an input and
output place of t. A Petri net is said to be pure if it has no self-loops. A Petri net is said to
be ordinary if all of its arc weights are 1’s.
The above transition rule is illustrated in Figure 4.1 using the well-known chemical
reaction: 2H
2
+ O
2
2H
2
O. Two tokens in each input place in Figure 4.1(a) show that
two units of H
2
and O
2
are available, and the transition t is enabled. After firing t, the
marking will change to the one shown in Figure 4.1(b), where the transition t is no longer
enabled.
Petri nets have also been extended in many dierent ways to study specific sys-
tem properties, such as performance, reliability, and schedulability. Well-known ex-
amples of extended Petri nets include timed Petri nets (WANG, 1998), stochastic Petri
nets (MARSAN et al., 1998), and coloured Petri nets (JENSEN, 1991; ROUBTSOVA,
2004; FUKUZAWA; SAEKI, 2002). Here, we present the coloured Petri nets which will
be used for determining dependencies between processes.
4.2 Coloured Petri Nets
Coloured Petri nets (CP-nets or CPNs) is a modeling language developed for systems in
which communication, synchronization and resource sharing play an important role. CP-
nets combine the strengths of ordinary Petri nets with the strengths of a high-level pro-
gramming language (ROUBTSOVA, 2004). Petri nets provide the primitives for process
interaction, while the programming language provides the primitives for the definition of
data types and the manipulations of data values.
36
2
2
H
2
O
2
H O
2
t
(a)
2
2
H
2
O
2
H O
2
t
(b)
Figure 4.1: An illustration of a transition (firing) rule: (a) the marking before firing the
enabled transition t; (b) the marking after firing t, where t is disabled.
CPN models can be made with or without explicit reference to time. Untimed CPN
models are usually used to validate the functional/logical correctness of a system, while
timed CPN models are used to evaluate the performance of the system (FUKUZAWA;
SAEKI, 2002). There are many other languages which can be used to investigate the
functional/logical correctness of a system or the performance of it. However, it is rather
seldom to find modeling languages that are well-suited for both kinds of analysis.
CP-nets also oers more formal verification methods, known as state space analysis
and invariant analysis. In this way it is possible to prove, in the mathematical sense of the
word, that a system has a certain set of behavioral properties.
There are three dierent, but closely related, reasons to make CPN models (and other
kinds of behavioral models). First of all, a CPN model is a description of the modeled
system, and it can be used as a specification (of a system which we want to built) or as a
presentation (of a system which we want to explain to other people, or ourselves). By cre-
ating a model we can investigate a new system before we construct it. This is an obvious
advantage, in particular for systems where design errors may compromise security or be
expensive to correct. Secondly, the behavior of a CPN model can be analyzed, either by
means of simulation (which is equivalent to program execution and program debugging)
37
or by means of more formal analysis methods (which are equivalent to program verifi-
cation). Finally, it should be understood that the process of creating the description and
performing the analysis usually gives the modeler a dramatically improved understanding
of the modeled system, and it is often the case that this is more valid than the description
and the analysis results themselves (JENSEN, 1991).
Most of the advantages of CP-nets are subjective by nature and cannot be proved in
any formal way. It will be presented a list of advantages, accompanied by brief explana-
tions and justifications (JENSEN, 1997).
1. CP-nets have a graphical representation - The graphical form is intuitively very
appealing. It is extremely easy to understand and grasp, even for people who are not
very familiar with the details of CP-nets. This is due to the fact that CPN diagrams
resemble many of the informal drawings which designers and engineers make while
they construct and analyze a system.
2. CP-nets have a well-defined semantics which unambiguously defines the be-
havior of each CP-net - It is the presence of the semantics which makes it possible
to implement simulators for CP-nets, and it is also the semantics which forms the
foundation for the formal analysis methods.
3. CP-nets are very general and can be used to describe a large variety of dif-
ferent systems - The applications of CP-nets range from informal systems (such
as the description of work processes) to formal systems (such as communication
protocols). They also range from software systems (such as distributed algorithms)
to hardware systems (such as VLSI chips). Finally, they range from systems with
a lot of concurrent processes (such as flexible manufacturing) to systems with no
concurrency (such as sequential algorithms).
4. CP-nets have very few, but powerful, primitives - The definition of CP-nets is
rather short and it builds upon standard concepts which many system modelers
already know from mathematics and programming languages. This means that it
is relatively easy to learn to use CP-nets. However, the small number of primitives
also means that it is much easier to develop strong analysis methods.
5. CP-nets have an explicit description of both states and actions - This is in con-
trast to most system description languages which describe either the states or the
38
actions, but not both. Using CP-nets, it is easily possible to change the point of fo-
cus during the work. At some instances of time it may be convenient to concentrate
on the states (and almost forget about the actions) while at other instances it may be
more convenient to concentrate on the actions (and almost forget about the states).
6. CP-nets have a semantics which builds upon true concurrency, instead of in-
terleaving - This means that the notions of conflict, concurrency and casual depen-
dency can be defined in a very natural and straightforward way. In an interleaving
semantics, it is impossible to have two actions in the same step, and thus concur-
rency only means that the actions can occur after each other, in any order.
7. CP-netsoer hierarchical descriptions - This means that it is possible to construct
a large CP-net by relating smaller CP-nets to each other, in a well-defined way. The
hierarchy constructs of CP-nets play a role similar to that of subroutines, procedures
and modules of programming languages, and it is the existence of hierarchical CP-
nets which makes it possible to model very large systems in a manageable and
modular way.
8. CP-nets integrate the description of control and synchronization with the de-
scription of data manipulation - This means that on a single sheet of paper it
can be seen what the environment, enabling conditions and eects of an action are.
Many other graphical description languages work with graphs which only describe
the environment of an action, while the detailed behavior is specified separately
(often by means of unstructured prose).
9. CP-nets are stable towards minor changes of the modeled system - This is
proved by many practical experiences and it means that small modifications of the
modeled system do not completely change the structure of the CP-net. In particu-
lar, it should be observed that this is also true when a number of subnets describing
dierent sequential processes are combined into a large CP-net. In many other
description languages, e.g., finite automata, such a combination often yields a de-
scription which it is dicult to relate to the original sub-descriptions.
10. CP-nets oer interactive simulations where the results are presented directly
on the CPN diagram - The simulation makes it possible to debug a large model
39
while it is being constructed, analogously to a good programmer debugging the
individual parts of a program as he finishes them. The colours of the moving tokens
can be inspected.
11. CP-nets have a large number of formal analysis methods by which proper-
ties of CP-nets can be proved - There are four basic classes of formal analysis
methods: construction of occurrence graphs (representing all reachable markings),
calculation and interpretation of system invariants (called place and transition in-
variants), reductions (which shrink the net without changing a certain selected set
of properties) and checking of structural properties (which guarantee certain behav-
ioral properties).
12. CP-nets have computer tools supporting their drawing, simulation and formal
analysis - This makes it possible to handle even large nets without drowning in de-
tails and without making trivial calculation errors. The existence of such computer
tools is extremely important for the practical use of CP-nets.
CP-nets can be analysed in four dierent ways. The first analysis method is interactive
simulation. It is very similar to debugging and prototyping. This means that we can
execute a CP-net model, to make a detailed investigation of the behaviour of the modelled
system. It is possible to set breakpoints and to visualise the simulation results by means
of dierent kinds of graphics.
The second analysis method is automatic simulation which is similar program execu-
tion. It allows a fast execution of thousands or millions of transitions. The purpose is to
investigate the functional correctness of the system or to investigate the performance of
the system, e.g. to identify bottlenecks, to predict the use of buer space or the mean/-
maximal service time (JENSEN, 1991).
The third analysis method is occurrence graphs (also called state spaces or reachabil-
ity graphs). The basic idea behind occurrence graphs is to construct a directed graph
which has a node for each reachable system state and an arc for each possible state
change. Obviously, such a graph may become very large, even for small CP-nets. How-
ever, it can be constructed and analysed totally automatically, and there exist techniques
which makes it possible to work with condensed occurrence graphs without losing ana-
lytic power (JENSEN, 1997). These techniques build upon equivalence classes.
40
The fourth analysis method is place invariants. This method is very similar to the
use of invariants in ordinary program verification. The user constructs a set of equations
which is proved to be satisfied for all reachable system states. The equations are used to
prove properties of the modelled system.
4.3 MPhyScaS - Petri Net Model
As already stated, a simulation involving a set of dierent physic phenomena demands a
great amount of time. In order to improve MPhyScaS simulation performance, a model
based on Petri nets is proposed for practical analysis of simulation and simulator data.
This model makes use of hierarchical parallelization concepts to identify which processes
(instances) of each MPhyScaS layer can be executed in parallel. The following section
presents this model.
Definition 4.3.1 (MPhyScaS Layers Architecture - Petri net model) Each layer in the
MPhyScaS architecture is modeled through a set of places, described as follows:
P
k
is the set of places representing the Kernel (such set has a single place p
k
)
P
b
is the set of places representing the Blocks
P
alg
is the set of places representing the Algorithms
P
g
is the set of places representing the Groups
P
gt
is the set of places representing the GroupTasks
P
ph
is the set of places representing the Phenomena
P
q
is the set of places representing the Quantities
P
s
is the set of places representing the States
In Chapter 2 we said that the Kernel represents the overall scenery of the simulation
and it stores system data related to the parameters for its loops and iterations. We also said
that in a simulation only one Kernel can exists. Hence, the relationship between Block
and Kernel is as presented in Definition 4.3.2.
41
Definition 4.3.2 (Block and Kernel Relationship - Petri Net Model) Let T
bk
be a set of
transitions, such that:
1.
t
bk
T
bk
{O(t
bk
)} = P
k
2.
t
bk
T
bk
{I(t
bk
)} = P
b
3. #P
b
= #T
bk
4. let p
b1
, p
b2
P
b
, p
b1
p
b2
, t
bk
T
bk
|(p
b1
t
bk
) (p
b2
t
bk
)
Assuming P
k
= {p
k
}, P
b
= {p
b1
, p
b2
, p
b3
}.
Based on Definition 4.3.2, Petri net PN
1
creates a valid relationship between Block
and Kernel (see the graphical representation in Figure 4.2(a)). Similarly, Petri net PN
2
violates rule 4 of Definition 4.3.2. Here, transition t
bk2
has both input places p
b2
and p
b3
.
A graphical representation of Petri nets PN
1
and PN
2
is depicted in Figure 4.2.
PN
1
= { PN
2
= {
{p
k
, p
b1
, p
b2
, p
b3
} {p
k
, p
b1
, p
b2
, p
b3
}
{t
bk1
, t
bk2
, t
bk3
} {t
bk1
, t
bk2
, t
bk3
}
{{p
b1
}, {p
b2
}, {p
b3
}} {{p
b1
}, {p
b2
, p
b3
}, {}}
{{p
k
}, {p
k
}, {p
k
}} {{p
k
}, {p
k
}, {p
k
}}
{} {}
{0, 0, 0, 0} {0, 0, 0, 0}
} }
The rules in Definition 4.3.2 exist because of the Petri net must represent the rela-
tionships correctly. Petri net PN
2
does not represent all relationships correctly. Beside of
Petri net PN
2
does not represent a request from the Kernel (because in Petri net PN
2
the
Kernel asks to Blocks 2 and 3 using only one request for executing something), the way
represented in Petri net PN
2
would not permit us to analyze each path that one request go
through.
Definition 4.3.3 (Algorithm and Block Relationship - Petri Net Model) Let T
algb
be a
set of transitions, such that:
1.
t
algb
T
algb
{O(t
algb
)} = P
b
42
p
k
p
b3
p
b1
p
b2
(a)
p
k
p
b3
p
b1
p
b2
(b)
Figure 4.2: Graphical representation of Petri nets PN
1
and PN
2
.
2.
t
algb
T
algb
{I(t
algb
)} = P
alg
3. #P
alg
= #T
algb
4. let p
alg1
, p
alg2
P
alg
, p
alg1
p
alg2
, t
algb
T
algb
|(p
alg1
t
algb
) (p
alg2
t
algb
)
5. let p
b1
, p
b2
P
b
, p
b1
p
b2
, t
algb
T
algb
|(p
b1
t
algb
) (p
b2
t
algb
)
Assuming P
b
= {p
b1
, p
b2
}, P
alg
= {p
alg1
, p
alg2
, p
alg3
}.
On the basis of Definition 4.3.3, a validPetri net is represented by PN
3
which is graph-
ical represented on Figure 4.3(a). Petri net PN
4
violates rules 4 and 5 of Definition 4.3.3.
Here, there are two input places (p
alg1
and p
alg2
) for the transition t
algb1
and two output
places (p
b1
and p
b2
) to the transition t
algb3
.
PN
3
= { PN
4
= {
{p
b1
, p
b2
, p
alg1
, p
alg2
, p
alg3
} {p
b1
, p
b2
, p
alg1
, p
alg2
, p
alg3
}
{t
algb1
, t
algb2
, t
algb3
} {t
algb1
, t
algb2
, t
algb3
}
{{p
alg1
}, {p
alg2
}, {p
alg3
}} {{p
alg1
, p
alg2
}, {}, {p
alg3
}}
{{p
b1
}, {p
b1
}, {p
b2
}} {{p
b1
}, {p
b1
}, {p
b1
, p
b2
}}
{} {}
{0, 0, 0, 0, 0} {0, 0, 0, 0, 0}
} }
43
p
b1
p
alg3
p
alg1
p
alg2
p
b2
(a)
p
b1
p
alg3
p
alg1
p
alg2
p
b2
t
algb1
t
algb2
t
algb3
(b)
Figure 4.3: Graphical representation of Petri nets PN
3
and PN
4
.
Beside of Petri net PN
4
does not represent a request form the Block (because in Petri
net PN
4
the Block 1 asks to Algorithms 1 and 2 using only one request for executing
something), the way represented in Petri net PN
4
would not permit us to analyze each
path that one request go through.
Definition 4.3.4 (Group and Algorithm Relationship - Petri Net Model) Let T
galg
be a
set of transitions, such that:
1.
t
galg
T
galg
{O(t
galg
)} = P
alg
2.
t
galg
T
galg
{I(t
galg
)} = P
g
3. #P
g
<= #T
galg
4. let p
g1
, p
g2
P
g
, p
g1
p
g2
, t
galg
T
galg
|(p
g1
t
galg
) (p
g2
t
galg
)
5. let p
alg1
, p
alg2
P
alg
, p
alg1
p
alg2
, t
galg
T
galg
|(p
alg1
t
galg
) (p
alg2
t
galg
)
6. let p
b1
P
b
, p
alg1
P
alg
, p
g1
P
g
, t
galg
p
g1
|p
alg1
t
galg
p
alg1
t
algb
,
t
algb
= {p
b1
}
Assuming P
b
= {p
b1
}, P
alg
= {p
alg1
, p
alg2
}, P
g
= {p
g1
, p
g2
}.
Petri net PN
5
shows an example of a valid Petri net on the basis of Definition 4.3.4.
The graphical representation is depicted in Figure 4.4(a).
44
PN
5
= {
{p
b1
, p
alg1
, p
alg2
, p
g1
}
{t
algb1
, t
algb2
, t
galg1
, t
galg2
}
{{p
alg1
}, {p
alg2
}, {p
g1
}, {p
g1
}}
{{p
b1
}, {p
b1
}, {p
alg1
}, {p
alg2
}}
{}
{0, 0, 0, 0}
}
It is possible to see in Petri net PN
6
the violation of rules 4 and 5 of Definition 4.3.4.
The transition t
galg2
has two input places (p
g2
and p
g3
) and the transition t
galg1
has two
output places (p
alg1
and p
alg2
), which characterize the violation, respectively.
Petri net PN
7
is an example of rule 6 violation in Definition 4.3.4. Here, p
g1
pro-
vides support to Algorithms of dierent Blocks (the Algorithms represented by the places
p
alg1
and p
alg2
belongs to the Blocks represented by the places p
b1
and p
b2
respectively).
Figure 4.4 shows the graphical representation of Petri nets PN
5
, PN
6
and PN
7
.
PN
6
= PN
7
= {
{p
alg1
, p
alg2
, p
alg3
, p
g1
, p
g2
, p
g3
} {p
b1
, p
b2
, p
alg1
, p
alg2
, p
g1
}
{t
galg1
, t
galg2
, t
galg3
} {t
algb1
, t
algb2
, t
galg1
, t
galg2
}
{{p
g1
}, {p
g2
, p
g3
}, {}} {{p
alg1
}, {p
alg2
}, {p
g1
}, {p
g1
}}
{{p
alg1
, p
alg2
}, {p
alg2
}, {p
alg3
}} {{p
b1
}, {p
b2
}, {p
alg1
}, {p
alg2
}}
{} {}
{0, 0, 0, 0, 0, 0} {0, 0, 0, 0, 0}
} }
Beside of Petri net PN
6
does not represent a request from the Algorithm (because in
Petri net PN
6
the Algorithm 2 asks to Groups 2 and 3 using only one request for executing
something), the way represented in Petri net PN
6
would not permit us to analyze each
path that one request go through. Another error in Petri net PN
6
representation is that
Algorithms 1 and 2 ask simultaneously to Group 1 for executing. This is impossible in
MPhyScaS architecture definition.
In Petri net PN
7
two dierent Algorithms (Algorithms 1 and 2) belonging to dierent
Blocks (Blocks 1 and 2 respectively) ask for the execution of the same Group (Group 1).
This is also impossible in MPhyScaS architecture definition.
45
p
alg2
t
algb2
p
alg1
t
algb1
t
galg1
t
galg2
p
g1
p
b1
(a)
p
alg2
p
alg1
t
galg1
t
galg2
p
g1
p
alg3
t
galg3
p
g2
p
g3
(b)
p
alg2
p
b2
t
algb2
p
alg1
p
b1
t
algb1
t
galg1
t
galg2
p
g1
(c)
Figure 4.4: Graphical representation of Petri nets PN
5
, PN
6
and PN
7
.
Definition 4.3.5 (GroupTask and Group Relationship - Petri Net Model) LetT
gtg
be a
set of transitions, such that:
1.
t
gtg
T
gtg
{O(t
gtg
)} = P
g
2.
t
gtg
T
gtg
{I(t
gtg
)} = P
gt
3. #P
gt
= #T
gtg
4. let p
gt1
, p
gt2
P
gt
, p
gt1
p
gt2
, t
gtg
T
gtg
|(p
gt1
t
gtg
) (p
gt2
t
gtg
)
5. let p
g1
, p
g2
P
g
, p
g1
p
g2
, t
gtg
T
gtg
|(p
g1
t
gtg
) (p
g2
t
gtg
)
Assuming P
g
= {p
g1
, p
g2
}, P
gt
= {p
gt1
, p
gt2
, p
gt3
}.
46
One can note that Petri net PN
8
is valid on the basis of Definition 4.3.5 (see the graphi-
cal representation on Figure 4.5(a)), and PN
9
violates rules 4 and 5 of the same definition.
In the first violation of PN
9
, the transition t
gtg1
has two input places (p
gt1
and p
gt2
), and
in the second one, the transition t
gtg3
has two output places (p
g1
and p
g2
). The graphical
representation of Petri nets PN
8
and PN
9
are depicted in Figure 4.5.
PN
8
= { PN
9
= {
{p
g1
, p
g2
, p
gt1
, p
gt2
, p
gt3
} {p
g1
, p
g2
, p
gt1
, p
gt2
, p
gt3
}
{t
gtg1
, t
gtg2
, t
gtg3
} {t
gtg1
, t
gtg2
, t
gtg3
}
{{p
gt1
}, {p
gt2
}, {p
gt3
}} {{p
gt1
, p
gt2
}, {}, {p
gt3
}}
{{p
g1
}, {p
g1
}, {p
g2
}} {{p
g1
}, {p
g1
}, {p
g1
, p
g2
}}
{} {}
{0, 0, 0, 0, 0} {0, 0, 0, 0, 0}
} }
p
g1
p
gt3
p
gt1
p
gt2
p
g2
t
gtg1
t
gtg2
t
gtg3
(a)
p
g1
p
gt3
p
gt1
p
gt2
p
g2
t
gtg1
t
gtg2
t
gtg3
(b)
Figure 4.5: Graphical representation of Petri nets PN
8
and PN
9
.
Beside of Petri net PN
9
does not represent a request from the Group (because in Petri
net PN
9
the Group 1 asks to GroupTasks 1 and 2 using only one request for executing
something), the way represented in Petri net PN
9
would not permit us to analyze each
path that one request go through.
Definition 4.3.6 (Phenomenon and GroupTask Relationship - Petri Net Model) Let T
phgt
be a set of transitions, such that:
47
1.
t
phgt
T
phgt
{O(t
phgt
)} = P
gt
2.
t
phgt
T
phgt
{I(t
phgt
)} = P
ph
3. #P
ph
<= #T
phgt
4. let p
ph1
, p
ph2
P
ph
, p
ph1
p
ph2
, t
phgt
T
phgt
|(p
ph1
t
phgt
) (p
ph2
t
phgt
)
5. let p
gt1
, p
gt2
P
gt
, p
gt1
p
gt2
, t
phgt
T
phgt
|(p
gt1
t
phgt
) (p
gt2
t
phgt
)
6. let p
g1
P
g
, p
gt1
P
gt
, p
ph1
P
ph
, t
phgt
p
ph1
|p
gt1
t
phgt
p
gt1
t
gtg
,
t
gtg
= {p
g1
}
Assuming P
g
= {p
g1
}, P
gt
= {p
gt1
, p
gt2
}, P
ph
= {p
ph1
, p
ph2
}.
On the basis of Definition 4.3.6, Petri net PN
10
(see Figure 4.6(a)) is valid.
PN
10
= {
{p
g1
, p
gt1
, p
gt2
, p
ph1
, p
ph2
}
{t
gtg1
, t
gtg2
, t
phgt1
, t
phgt2
, t
phgt3
}
{{p
gt1
}, {p
gt2
}, {p
ph1
}, {p
ph2
}, {p
ph2
}}
{{p
g1
}, {p
g1
}, {p
gt1
}, {p
gt1
}, {p
gt2
}}
{}
{0, 0, 0, 0, 0}
}
It is possible to note that in Petri net PN
11
there is a transition (t
phgt1
) that has two
input places (p
ph1
and p
ph2
). Similarly, the transition t
phgt3
has two output places (p
gt1
and
p
gt2
). Petri net PN
11
violates rules 4 and 5 of Definition 4.3.6 respectively.
Petri net PN
12
violates rule 6 of Definition 4.3.6. Here, p
ph1
provides support to two
GroupTasks that belongs to dierent Groups (GroupTasks represented by places p
gt1
and
p
gt2
belongs to Groups represented by the places p
g1
and p
g2
respectively). Petri nets
PN
10
, PN
11
and PN
12
are depicted in Figure 4.6.
48
PN
11
= { PN
12
= {
{p
gt1
, p
gt2
, p
ph1
, p
ph2
, p
ph3
} {p
g1
, p
g2
, p
gt1
, p
gt2
, p
ph1
}
{t
phgt1
, t
phgt2
, t
phgt3
} {t
gtg1
, t
gtg2
, t
phgt1
, t
phgt2
}
{{p
ph1
, p
ph2
}, {}, {p
ph3
}} {{p
gt1
}, {p
gt2
}, {p
ph1
}, {p
ph1
}}
{{p
gt1
}, {p
gt1
}, {p
gt1
, p
gt2
}} {{p
g1
}, {p
g2
}, {p
gt1
}, {p
gt2
}}
{} {}
{0, 0, 0, 0, 0} {0, 0, 0, 0, 0}
} }
p
gt2
t
gtg2
p
gt1
t
gtg1
t
phgt2
t
phgt3
p
ph2
p
g1
t
phgt1
p
ph1
(a)
p
gt1
p
ph3
p
ph1
p
ph2
p
gt2
t
phgt1
t
phgt2
t
phgt3
(b)
p
gt2
p
g2
t
gtg2
p
gt1
p
g1
t
gtg1
t
phgt1
t
phgt2
p
ph1
(c)
Figure 4.6: Graphical representation of Petri nets PN
10
, PN
11
and PN
12
.
Beside of Petri net PN
11
does not represent a request form the GroupTask (beacuse
in Petri net PN
11
the GroupTask 1 asks to Phenomenon 1 and 2 using only one request
49
for executing something), the way represented in Petri net PN
11
would not permit us to
analyze each path that one request go through.
In Petri net PN
12
two dierent GroupTasks (GroupTasks 1 and 2) belonging to dier-
ent Groups (Groups 1 and 2 respectively) ask for the execution of the same Phenomenon
(Phenomenon 1). This is impossible in MPhyScaS architecture definition.
Definition 4.3.7 (Quantity and Phenomenon Relationship - Petri Net Model) Let T
qph
be a set of transitions, such that:
1.
t
qph
T
qph
{O(t
qph
)} = P
ph
2.
t
qph
T
qph
{I(t
qph
)} = P
q
3. #P
q
= #T
qph
4. let p
q1
, p
q2
P
q
, p
q1
p
q2
, t
qph
T
qph
|(p
q1
t
qph
) (p
q2
t
qph
)
5. let p
ph1
, p
ph2
P
ph
, p
ph1
p
ph2
, t
qph
T
qph
|(p
ph1
t
qph
) (p
ph2
t
qph
)
Assuming P
ph
= {p
ph1
, p
ph2
}, P
q
= {p
q1
, p
q2
, p
q3
}.
Based on Definition 4.3.7, Petri net PN
13
creates a valid relationship between Quantity
and Phenomenon (see the graphical representation in Figure 4.7(a)), and PN
14
violates
rules 4 and 5 of the same definition. In PN
14
, the transition t
qph1
has two input places (p
q1
and p
q2
) and the transition t
qph3
has two output places (p
ph1
and p
ph2
).
PN
13
= { PN
14
= {
{p
ph1
, p
ph2
, p
q1
, p
q2
, p
q3
} {p
ph1
, p
ph2
, p
q1
, p
q2
, p
q3
}
{t
qph1
, t
qph2
, t
qph3
} {t
qph1
, t
qph2
, t
qph3
}
{{p
q1
}, {p
q2
}, {p
q3
}} {{p
q1
, p
q2
}, {}, {p
q3
}}
{{p
ph1
}, {p
ph1
}, {p
ph2
}} {{p
ph1
}, {p
ph1
}, {p
ph1
, p
ph2
}}
{} {}
{0, 0, 0, 0, 0} {0, 0, 0, 0, 0}
} }
Beside of Petri net PN
14
does not represent a request form the Phenomenon (beacuse
in Petri net PN
14
the Phenomenon 1 asks to Quantities 1 and 2 using only one request
50
p
ph1
p
q3
p
q1
p
q2
p
ph2
t
qph1
t
qph2
t
qph3
(a)
p
ph1
p
q3
p
q1
p
q2
p
ph2
t
qph1
t
qph2
t
qph3
(b)
Figure 4.7: Graphical representation of Petri nets PN
13
and PN
14
.
for executing something), the way represented in Petri net PN
14
would not permit us to
analyze each path that one request go through.
Quantities are responsible for assembling global states (matrices, vectors and scalars),
so that there will be a connection form the global state to the quantity when this quantity
is responsible for the assembling of this global state. In order to characterize a coupling,
when it exists, there will be a connection from the quantity to the global state. This
characterize that the quantity needs that the global state is already assembled. Therefore,
the global states are the ones which are at the bottom of our model.
4.4 Model Analysis
From the model defined, we realize that the tokens should be valued because they repre-
sent the task that is executing at that moment. Therefore, we refined the model to become
a Coloured Petri Net (CPN) model. For this purpose, a new place is first created and it is
connected through a transition to all places that represent global states (the places at the
bottom of the model). This new place will contain a token, which will be the single token
in the net we created. Secondly, all transitions will have a restriction so that only tokens
with a certain value can fire them. The token value allowed to fire a given transition is
a string created through the composition of all place names in the path from the highest
level element (Kernel) until the transition. For example, the transition that restricts the
token to be 0.2.1 is the transition between Block 2 and Algorithm 1, where Block 2 is
51
related to the kernel which has code 0.
When the model becomes a CP-net, we use the CPNTools (CPNGROUP, 2009) to
analyze it. We use the occurrence graphs method (also called state spaces or reachabil-
ity graphs) to realize the analysis. In other words, before beginning the analysis, it is
necessary to generate the state space of the net. Then, the analysis is as presented in
Pseudo-Code 2.
Pseudo-Code 2: Model analysis pseudo-code.
foreach node of the state Space do1
boolean dependence = Verify Dependence(node) //Verify if there2
are more than one tokens and if they are at the same level
if dependence then3
Write In File(node)4
end5
end6
Line 1 of Algorithm 2 shows that each node of the state space, i.e. each reachable
system state will be analyzed. In line 2 it is verified if there is any dependence in the node.
For this, it is verified if there are more than one token in the net and if the tokens are at
the same level of MPhyScaS architecture. If a dependence exists, the node indicating the
places that contains the token will be writen in a file (lines 3 to 5).
To illustrate the analysis done, Figure 4.8 shows a part of a Petri net that represents
the execution of two GroupTasks. GroupTask 0 (GT 0) requires Phenomenon 0 (Ph 0) to
execute Quantity 0 (q 0) which assembles the global state 0 (S 0). In the same Petri net,
GroupTask 1 (GT 1) requires Phenomenon 0 (Ph 0) to execute Quantity 1 (q 1) which
needs that global state 0 (S 0) is already assembled to be able to assemble the global state
1 (S 1).
In order to make it easier the model analysis, we invert the direction of each arc in
the Petri net as shown in Figure 4.9. This helps us to identify the entire path which
a requirement of a task execution passed through. In this way, the analysis begins in
the execution of a task (State level) and goes through the levels following a bottom-up
direction until it reachhes the top level (Kernel level).
The first task to execute is represented by a token with 0.0.0.0 value. This token is in
52
GT 0 GT 1
Ph 0
q 0 q 1
S 0 S 1
Figure 4.8: An example of a Petri net that represents the execution of two GroupTasks.
GT 0 GT 1
Ph 0
q 0 q 1
S 0 S 1
Figure 4.9: The Petri net with arcs direction inverted.
the place that represents the global state 0 (S 0) because this is the global state assembled
by the task 0.0.0.0 which is the first task of the simulation (see Figure 4.10(a)). This
marking enables the transition that makes the token go to the place that represents the
Quantity 0. This is the quantity that assembles the global state 0.
When the token is in the place that represents the Quantity 0 (Figure 4.10(b)), it en-
ables the transition that makes the token go to the place that represents the Phenomenon 0.
In the next step, the tokenis in the place that represents the Phenomenon 0 (as depicted
in Figure 4.10(c)). It enables the transition that makes the token go to the place that
represents GroupTask 0.
The analysis continues until Kernel level is reached. When this occurs, the token
53
GT 0 GT 1
Ph 0
q 0 q 1
S 0 S 1
token = 0.0.0.0
(a)
GT 0 GT 1
Ph 0
q 0 q 1
S 0 S 1
token = 0.0.0.0
(b)
GT 0 GT 1
Ph 0
q 0 q 1
S 0 S 1
token = 0.0.0.0
(c)
Figure 4.10: The Petri net representing the execution of the first executed task.
assumes the value of the next task of the simulation. In the case of the example, the token
assumes 1.1.0.1 value. It is placed in the global state 1 which is the next global state to be
assembled in the simulation. Figure 4.11(a) shows the token in the place that represents
global state 1. In this case, the token enables the transition that makes the token go to the
place that represents Quantity 1.
When the transition is fired and the token is in Quantity 1 place, the token enables two
transitions to fire. The first transition is the one which makes the token go to place that
represents Phenomenon 0. The second transition is the one which makes the token go to
place that represents the global state 0. Figure 4.11(b) depicts this situation.
The dependencies are identified when a token enables two transitions. In this case, we
verify all state space constructed from this moment. The analysis search for objects that
belong to level that executes something. This levels are Kernel, Algorithm, QuantityTask,
GroupTask and Quantity levels. The other levels only forward a requirement, executing
nothing. There is a dependence between the objects found of the same level. The depen-
dence occurs so that the object that came from an already executed task must be executed
before the other one. In the example, the analysis finds two dependences: Quantity 1
depends on the Qauntity 0 for executing; and GroupTask 1 depends on the GroupTask 0
for executing.
In the Coloured Petri Net generated, each transition restricts the token value that can
54
GT 0 GT 1
Ph 0
q 0
q 1
S 0 S 1
token = 1.1.0.1
(a)
GT 0 GT 1
Ph 0
q 0 q 1
S 0 S 1
token = 1.1.0.1
(b)
Figure 4.11: The Petri net representing the execution of the second executed task.
enable and fire itself. This is important because the analysis do not identify dependences
if an object depends on another one and this last one depends on another one. In other
words, the analysis do not identify dependence between the first object and the third one.
55
Chapter 5
Proposed Algorithm
This chapter introduces the concepts of Genetic Algorithms (GA) and presents a new
GA approach. This new approach uses the dependencies found in CPN analysis and
also considers some new eects in scheduling process, concerned in solving MPhyScaS
scheduling problem
5.1 Genetic Algorithms
Evolutionary computation (EC) has become a standard term to indicate problem-solving
techniques which use design principles inspired from models of the natural evolution of
species.
Historically, there are three main algorithmic developments within the field of EC:
evolutionstrategies(BEYER, 2001), evolutionaryprogramming (ENGELBRECHT, 2007),
and genetic algorithms (COLEY, 1998). Common to these approaches is that they are
population-based algorithms that use operators inspired by population genetics to explore
the search space (DORIGO; STüTZLE, 2004). Each individual in the algorithm repre-
sents directly or indirectly a solution to the problem under consideration.
Dierences among the dierent EC algorithms concern the particular representations
chosen for the individuals and the way genetic operators are implemented. For example,
genetic algorithms typically use binary or discrete valued variables or represent informa-
tion in individuals and they favor the use of recombination, while evolution strategies
and evolutionary programming often use continuous variables and put more emphasis on
the mutation operator. Nevertheless, the dierences between the dierent paradigms are
56
becoming more and more blurred (DORIGO; STüTZLE, 2004).
A general outline of an EC algorithm in given in Pseudo-Code 3, where pop denotes
the population of individuals. To define an EC algorithm the following functions have to
be specified:
The function Initialize Population, which generates the initial population;
The function Evaluate Population, which computes the fitness values of the
individuals;
The function Best Of Population, which returns the best individual in the cur-
rent population;
The function Select and Recombine, which repeatedly selects two or more in-
dividuals and combine them to form one or more new individuals;
The function Mutate, which introduces a small random perturbation, when applied
to one individual;
Pseudo-Code 3: Evolutionary computation algorithm pseudo-code.
pop Initialize Population1
Evaluate Population(pop)2
sbest Best Of Population(pop)3
while termination condition not met do4
pop Select And Recombine(pop)5
pop " Mutate(pop)6
Evaluate Population(pop ")7
s Best Of Population(pop ")8
if f(s) < f(sbest)9
then10
sbest s11
end12
pop pop "13
end14
return sbest15
57
The basicprinciples ofGA were first proposed by Holland(HOLLAND, 1975). There-
after, a series of literature and reports became available. GA is inspired by the mechanism
of natural selection where stronger individuals are likely the winners in a competing envi-
ronment. Here, GA uses a direct analogy of such natural evolution. Through the genetic
evolution method, an optimal solution can be found and represented by the final winner
of the genetic game.
GA presumes that the potential solution of any problem is an individual and can be
represented by a set of parameters. These parameters are regarded as the genes of a chro-
mosome and can be structured by a string of values in binary form. A positive value,
generally know as a fitness value, is used to reflect the degree of goodness of the chro-
mosome for the problem which would be highly related with its objective value (MAN;
TANG; KWONG, 1999).
Throughout a genetic evolution, the fitter chromosome has a tendency to yield good
quality ospring which means a better solution to any problem (BANZHAF et al., 1998).
In a practical GA application, a population pool of chromosomes has to be installed and
these can be randomly set initially. The size of this population varies from one problem to
another (LOBO; LIMA, 2005). In each cycle of genetic operation, termed as an evolving
process, a subsequent generation is created from the chromosomes in the current popula-
tion. This can only succeed if a group of these chromosomes, generally called parents, is
selected via a specific selection routine. The genes of the parents are mixed and recom-
bined for the production of ospring in the next generation. It is expected that from this
process of evolution (manipulation of genes), the better chromosome will create a large
number of ospring, and thus has a higher chance of surviving in the subsequent gen-
eration, emulating the survival-of-fittest mechanism in nature (MAN; TANG; KWONG,
1999).
The cycle of evolution is repeated until a desired termination criterion is reached.
This criterion can also be set by a number of evolution cycles (computational runs), or the
amount of variation of individuals between dierent generations, or a pre-defined value
of fitness.
58
5.1.1 Initial Population
The major questions to consider are firstly the size of the population, and secondly the
method by which the individuals are chosen. The choice of the population size has been
approached from several theoretical points of view, although the underlying idea is always
of a trade-o between eciency and eectiveness. Intuitively, it would seem that there
should be some optimal value for a given string length, on the grounds that too small
a population would not allow sucient room for exploring the search space eectively,
while too large a population would so impair the eciency of the method that no solution
could be expected in a reasonable amount of computation (REEVES; YAMADA, 1998).
5.1.2 Selection
During each successive generation, a proportion of the existing population is selected to
breed a new generation. Individual solutions are selected through a fitness-based process,
where fitter solutions (as measured by a fitness function) are typically more likely to be
selected. Certain selection methods rate the fitness of each solution and preferentially
select the best solutions. Other methods rate only a random sample of the population, as
this process may be very time-consuming (BANZHAF et al., 1998).
Most functions are stochastic and designed so that a small proportion of less fit so-
lutions are selected. This helps keep the diversity of the population large, preventing
premature convergence on poor solutions (REZENDE, 2002). Popular and well-studied
selection methods include fitness-proportionate selection (also known as roulette wheel
selection) and tournament selection.
5.1.2.1 Fitness-Proportionate Selection
Fitness-proportionate selection, also known as roulette wheel selection, is a genetic opera-
tor used in genetic algorithms for selecting potentially useful solutions for recombination.
In fitness-proportionate selection, as in all selection methods, the fitness function as-
signs a fitness to possible solutions or chromosomes. This fitness level is used to associate
a probability of selection with each individual chromosome (see Figure 5.1). If f
i
is the
fitness of individual i in the population, its probability of being selected is
59
p
i
=
f
i
N
j=1
f
j
,
where N is the number of individuals in the population. While candidate solutions
with a higher fitness will be less likely to be eliminated, there is still a chance that they
may be (REZENDE, 2002). Contrast this with a less sophisticated selection algorithm,
such as truncation selection, which will eliminate a fixed percentage of the weakest candi-
dates. With fitness-proportionate selection there is a chance some weaker solutions may
survive the selection process; this is an advantage, as though a solution may be weak,
it may include some component which could prove useful following the recombination
process (BANZHAF et al., 1998).
S
0
S
1
S
2
S
3
S
4
S
5
Figure 5.1: Fitness-proportionate selection example.
The analogy to a roulette wheel can be envisaged by imagining a roulette wheel in
which each candidate solution represents a pocket on the wheel; the size of the pockets
are proportionate to the probability of selection of the solution. Selecting N chromo-
somes from the population is equivalent to playing N games on the roulette wheel, as
each candidate is drawn independently (BANZHAF et al., 1998).
5.1.2.2 Tournament Selection
Tournament selection is one of many methods of selection in genetic algorithms which
runs a tournament among a few individuals chosen at random from the population and
selects the winner (the one with the best fitness) for crossover easily adjusted by changing
the tournament size. If the tournament size is larger, weak individuals have a smaller
chance to be selected. Figure 5.2 gives an example where four individuals are selected to
the tournament.
Deterministic tournament selection selects the best individual (when p = 1) in any
tournament. A 1-way tournament (k = 1) selection is equivalent to random selection.
60
S
0
S
3
S
5
S
2
S
0
S
2
S
0
Figure 5.2: Tournament selection example.
The chosen individual can be removed from the population that the selection is made
from if desired, otherwise individuals can be selected more than once for the next gener-
ation (REEVES; YAMADA, 1998).
Tournament selection has several benefits: it is ecient to code, works on parallel
architectures and allows the selection pressure to be easily adjusted (ZALZALA; FLEM-
ING, 1997).
5.1.3 Crossover
Crossover is also known as recombination. In the crossover operation exchanging infor-
mation among the strings present in the population creates new strings (solutions). Recall
that a string representation of a solution contains information in its genes. In crossover,
two strings are picked from the population and some portions of these strings are ex-
changed between them.
A common implementation of crossover uses the single-point crossover process in
which a crossing site is randomly chosen along the string length and all bits to the right
side of this crossing site are exchanged between the two parent strings (BAGCHI, 1999),
as shown in Figure 5.3.
For multipoint crossover, m crossover positions, k
i
{1, 2, ..., l 1}, where k
i
are the
crossover points and l is the length of the chromosome, are chosen at random with no
duplicates and sorted into ascending order. Then, the bits between successive crossover
points are exchanged between the two parents to produce two new ospring. The section
between the first allele position and the first crossover point is not exchanged between
individuals (ZALZALA; FLEMING, 1997). This process is illustrated in Figure 5.4.
The idea behind multipoint, and indeed many of the variations on the crossover op-
61
parents
offspring
1 0 0 0 1 10 0 0 1 0 1 0 1
1 0 0 1 0 10 0 0 1 0 0 1 1
Figure 5.3: Single-point crossover example.
parents
offspring
1 0 0 0 1 10 0 0 1 0 1 0 1
1 0 0 1 1 10 0 0 1 0 0 0 1
Figure 5.4: Multipoint crossover example.
erator, is that the parts of the chromosome representation that contribute most to the per-
formance of a particular individual may not necessarily be contained in adjacent sub-
strings (COLEY, 1998). Further, the disruptive nature of mutipoint crossover appears
to encourage the exploration of the search space, rather than favouring convergence to
highly fit individuals early in the search, thus making the search more robust (ZALZALA;
FLEMING, 1997).
Single and multipoint crossover define cross points as places between loci where a
chromosome can be spit. Uniform crossover (SYWERDA, 1989) generalises this scheme
to make every locus a potential crossover point. A crossover mask, the same length as
the chromosome structures, is created at random and the parity of the bits in the mask
indicates which parent will supply the ospring with each bits.
Uniform crossover, like multipoint crossover, has been claimed to reduce the bias
associated with the length of the binary representation used and the particular coding for
a given parameter set. This helps to overcome the bias in single-point crossover towards
short substrings without requiring precise understanding of the significance of individual
bits in the chromosome representation (ZALZALA; FLEMING, 1997). It does not use
crossover points, but determine a mask which indicates from one or the other parent the
62
genes will be inherited. In Figure 5.5, the value 1 in mask indicates that the corresponding
gene of the first parent will be inherited by the first ospring, and the corresponding gene
of the second parent will be inherited by the second ospring. For the value 0 in mask,
the reverse occurs.
parents
offspring
1 0 0 0 1 10 0 0 1 0 1 0 1
1 1 0 1 0 10 0 0 0 0 0 1 1
mask:
0 1 0 1 0 0 0
Figure 5.5: Uniform crossover example.
It is expected from the above operation that if good substrings from the parents get
combined by crossover, the children are likely to have improved fitness. Further, because
a pressure is exerted by survival-of-the-fittest selection method using in forming the pop-
ulation, when good strings are created in the progenies by crossover, they help the search
process by propagatingtheir superiorcharacteristics in the ensuinggenerations (BAGCHI,
1999). However, since the location of the appropriate (i.e. best) site of crossover cannot
be known a priori without considerable additional computation, a random site is chosen
in practice (COLEY, 1998).
Still, the eect of crossover can be detrimental or beneficial. Therefore, to preserve
the good strings in the population, not all strings in the population are used in crossover.
A crossover probability (p
c
) is used to decide whether a given member of the population
will be crossed.
5.1.4 Mutation
While a crossoveroperator attempts to produce newstrings of superior fitness by eecting
large changes in a string’s makeup, the need for local search around a current solution
also exists. This is accomplished by mutation. Mutation is additionally aimed to maintain
diversity in the population (BAGCHI, 1999).
63
In natural evolution, mutation is a random process where one allele of a gene is re-
placed by another to produce a new genetic structure. In GAs, mutation is randomly
applied with low probability and modifies elements in the chromosomes. Usually consid-
ered as a background operator, the role of mutation is often seen as providing a guarantee
that the probability of searching any given string will never be zero and acting as a safety
net to recover good genetic material which may be lost through the action of selection and
crossover (ZALZALA; FLEMING, 1997).
In practice, for example, if binary code is being used, a single bit in a string may
be altered (from 0 to 1, or 1 to 0) with a small probability, creating a new solution (see
Figure 5.6).
1 0 0 0 1 10
1 0 1 0 1 10
before mutation
after mutation
Figure 5.6: Mutation example.
Crossover aims at recombining parts of goods substrings from good parent strings to
hopefully create a better ospring. Mutation, on the other hand, alters a single child string
locally to hopefully create a superior child string.
5.1.5 Elitism
Fitness-proportionate selection does not guarantee the selection of any particular individ-
ual, including the fittest. Even then the fittest individual is much fitter than any other,
it will occasionally not be selected. To not be selected is to die. Thus with fitness-
proportionate selection, the best solution to the problem discovered so far can be regularly
thrown away.
Although it appears counterproductive, this can be advantageous for some problems
because it slows the algorithm, allowing it to explore more of the search space before
convergence (COLEY, 1998). This balance between the exploration of the search space
and the exploitation that is made the faster the progress of the algorithm, but the greater
64
the possibilityof the algorithm failing to finally locate the true global optimum (BAGCHI,
1999).
For manyapplications the search speed can be greatly improved by not losing the best,
or elite, members between generations. Ensuring the propagation of the elite members is
termed elitism and requires that not only is the elite members selected, but a copy of it
does not become disrupted by crossover or mutation (COLEY, 1998).
5.1.6 Termination
Like biological evolution that goes on and on, a genetic algorithm continues to run, evolv-
ing solutions until explicitly terminated. After each population is evaluated, the algorithm
checks a set of termination conditions. There are several common termination conditions
used in genetic algorithms (BANZHAF et al., 1998):
Maximum generations - The genetic algorithm stops when the specified number
of generation’s have evolved;
Elapsed time - The genetic process will end when a specified time has elapsed.
No change in fitness - The genetic process will end if there is no change to the
population’s best fitness for a specified number of generations.
Stall time limit - The algorithm stops if there is no improvement in the objective
function during an interval of time in seconds equal to stall time limit.
The termination conditions are often used in combination. A genetic algorithm can
be terminated after a certain number of generations, when a particular average fitness has
been achieved, or when the fitness function does not change after a specific number of
generations.
5.2 Proposed Algorithm
5.2.1 Individual Representation
The individual is represented by a sequence of processes for the execution. This sequence
is divided into blocks that have their size equals to the number of processors. Figure 5.7
is an example of an individual.
65
3
21
0
5
2
8
4
p
0
3
7
1
p
1
9 6
p
2
processors
processes indices
waiting time
execution step
Figure 5.7: An example of a scheduling individual.
A block represents the processes that will be executed in parallel, each process in a
processor. Each block also have an index that indicates an execution step (see Figure 5.7).
Note that, a block can contain less processes than its size. This occurs because of the
dependencies among the processes. In the example, processes 1, 4, 6 and 8 depends on
the ones indicated by 2 and 7. This lack of processes is called waiting time.
In our scheduling algorithm, the genetic operators aect only the processes. They
modify where the process will be executed (in what processor), and at what execution
time it will be executed.
5.2.2 Selection and Elitism
In ourproblem, the individualsare selected through a fitness-based process, the proportionate-
selection method. This method is used to select both individuals that will be aected by
crossover operator and mutation operator. Each of these operators selects a proportion
of the population and the values of the percentages of this proportion will be given in
Chapter 7. Besides this, the elitism concept is used. The percentage of elitism considered
will also be given in Chapter 7.
5.2.3 Crossover
The genetic algorithm proposed herein applies a proposed crossover operator to generate
two new individuals into the new generation. It is a two-level crossover consisting of
getting genes of the parents and swapping repeated information. In getting genes stage,
the processors of the parents are divided into two partitions: even processors and odd
processors. Each partition is given to a new ospring so that the first new ospring gets
66
the information of even processors of the first parent and odd processors of the second
parent, and the second new ospring gets the information of odd processors of the first
parent and even processors of the second parent. This process is shown in Figure 5.8.
parents
offspring
5
2
8
4
p
0
3
7
1
p
1
9 6
p
2
5
2
8
4
p
0
2
9
4
p
1
9 6
p
2
5
3
6
p
0
3
7
1
p
1
7 8
1
p
2
7 8
1
p
2
2
9
4
p
1
5
3
6
p
0
Figure 5.8: First step of crossover individuals.
In swapping repeated information stage, repeated jobs associated to an ospring are
searched, and the repeated jobs at the same position in the two new ospring will be
considered, as shown in Figure 5.9. The two new ospring swap these repeated jobs
between them (Figure 5.9 also shows the resulting of this swapping based on the result
obtained in Figure 5.8). This stage is important to increase the chance of an ospring is
feasible.
offspring
new
offspring
5
2
6
p
0
3
7
4
p
1
9 8
1
p
2
3
5
6
p
0
1
3
7
p
1
7 8
1
p
2
5
3
8
4
p
0
2
9
1
p
1
7 6
p
2
9 6
p
2
2
9
4
p
1
5
2
8
4
p
0
Figure 5.9: Second step of crossover individuals.
Although this does not occur every time, when an ospring is not feasible, we discard
it in order to have a new generation containing only feasible individuals. An example
that shows a crossover that generates unfeasible ospring is depicted in Figure 5.10, that
shows the first step of the crossover. The second step of the crossover is depicted in
Figure 5.11.
67
parents
offspring
1
3
6
p
0
7
2
p
1
9 4
p
2
1
3
6
p
0
4
1
8
p
1
9 4
p
2
5
7
p
0
7
2
p
1
2 3
p
2
2 3
p
2
4
1
8
p
1
5
7
p
0
8
5
9
6
5
9
8
6
Figure 5.10: First step of crossover individuals that generates unfeasible ospring.
One can note that in Figure 5.11 it is possible to change processes 4 and 7. They
appear repeated in the first and second ospring respectively. But processes 1 and 2 that
also appear repeated in the first and second ospring respectively can not be changed
because they do not occupy the same position. So that the ospring generated after the
second step of the crossover are not feasible.
offspring
new
offspring
5
7
p
0
4
2
p
1
2 3
p
2
7
5
p
0
2
7
p
1
2 3
p
2
1
3
6
p
0
7
1
8
p
1
9 4
p
2
9 4
p
2
4
1
8
p
1
1
3
6
p
0
5
9
8
6
5
9
8
6
Figure 5.11: Second step of crossover individuals that generates unfeasible ospring.
5.2.4 Mutation
The goal of the uniform mutation is to exchange two neighboring genes without violating
precedence relationship in order to create an individual that could not have been produced
by the crossover operator (KIM, 2007). The uniform mutation was operated as follows:
for each individual selected for this operation, it is generated an integer random number
r and then swaps two random processes in the rth execution step. These operators guar-
antee the modification in communication cost values without violating the precedence of
processes.
68
Figure 5.12 shows an example of this process. Here, we suppose that the integer
random number r selected is 3 so that the third execution step is the one which will be
mutated. After that, we suppose that we selected randomly the numbers 0 and 2 which
indicate that the processes that will change are the one in the third execution step of the
processor 0 and the one in the third execution step of the processor 2. These processes are
marked in Figure 5.12 that also shows the result after the mutation.
mutation
5
2
8
4
p
0
3
7
1
p
1
9 6
p
2
5
2
6
4
p
0
3
7
1
p
1
9 8
p
2
Figure 5.12: An example of the proposed mutation.
5.2.5 Fitness Function
Our scheduling algorithm wants to find a trade o between the execution of processes and
the time necessary for these processes to communicate with each other. Besides that, we
want to minimize the idle time. In other words, our scheduling algorithm aims to optimize
three criteria together.
The first criteria to be optimized is makespan (total finishing time), which means
the maximum execution time of the last job. The makespan function, represented by
C
max
= max{C
i
|i = 1, ..., n}, where C
i
is the finishing execution time of the job i. The
second criteria is the communication cost. Here, we represented this function by T, where
T =
t
ij
, i, j, assuming i, j two processes that need to communicate and t
ij
the time
spent with the communication. The third criteria is waiting time, which corresponds to the
total idle time of the processors. The waiting time function is represented by
n
j=1
WT
j
,
which means the sum of waiting time of each processor j.
Although in MPhyScaS problem the optimization criteria is the minimization of these
69
three functions, we put these criteria together in the following function
γ = ω
1
· C
max
+ ω
2
· T + ω
3
·
n
j=1
WT
j
,
where ω
1
, ω
2
and ω
3
are weights for giving importance to each function.
Here, we present an example of an individual representation based on the proposed
model. This GA individual representation is showed in Figure 5.13.
p
0
p
1
p
2
alg0
k0
b0
g0
ph0
qt0
qt1
gt0
gt1
q0
q1
Figure 5.13: An example of the proposed individual representation.
70
Chapter 6
Case Study
In this chapter we describe the case study used to evaluate the scheduling algorithm pro-
posed in this dissertation.
6.1 Basics
Based on the example showed as a GA individual in Figure 5.13 on Chapter 5 we will
explain how the elements will appear in the case study.
The Kernel is the first element to be approached in the case study. It is important to
know that the Kernel requests a Block to execute an Algorithm. In order to make the
Kernel reusable, the algorithms used by the Kernel are enumerated such that these virtual
indexes are keys for a map (defined in Definition 6.1.1) and the user must indicates the
actual algorithm for each virtual index.
Definition 6.1.1 A map is a function M : V A, where V is the set of virtual indexes
and A is the set of actual indexes.
The map that does this association between virtual and actual algorithms for the Ker-
nel is called algorithm map. Based on our example, the way the Kernel does this request
to a Block is described as follows:
blockVector[0] ExecuteAlgorithm(algorithmMap(0, 0))
And the algorithm map can be represented as Table 6.1. Here, blockVector[0] in-
dicates that our Block 0 is the required one. In algorithmMap(0, 0), the first number
71
indicates the required Block and the second number indicates the virtual index of the al-
gorithm that will be executed. If we look at the Table 6.1, we will see that the actual
Algorithm associated to this virtual one is our Algorithm 0.
Table 6.1: Algorithm map of Kernel
Block Virtual Algorithm Actual Algorithm
0 0 0
The algorithm, similarly to the Kernel, asks a Group to compute a QuantityTask. The
QuantityTasks called by the Algorithm are enumerated such that these virtual indexes are
keys for a map (called task map) and the user must indicates the actual QuantityTask for
each virtual index. Based on our example, the way the Algorithm does two requests to
the Group (asking for the computation of two dierent QuantityTasks) is described as
follows:
groupVector[0] ComputeQuantityTask(taskMap(0, 0))
groupVector[0] ComputeQuantityTask(taskMap(0, 1))
The task map can be represented as Table 6.2. Here, groupVector[0] indicates that
our Group 0 is required one. In taskMap(0, 0), the first number indicates the required
Group and the second number indicates the virtual index of the QuantityTask that will be
computed. In this case, if we look at the Table 6.2, wewill see that theactual QuantityTask
associated to this virtual one is our QuantityTask 0. In the case of the second line, where
the map is used as taskMap(0, 1), looking to Table 6.2 we see that the QuantityTask
associated to this virtual one is our QuantityTask 1.
Table 6.2: Task map of Algorithm
Group Virtual QuantityTask Actual QuantityTask
0
0 0
1 1
The next element to be approached is the QuantityTask, which is dierent from the
others here presented. The QuantityTask is a set of GroupTasks. Each GroupTask asks a
72
Phenomenon to execute a Quantity. Based on our example, the way the QuantityTasks,
GroupTasks, Phenomena and Quantites are represented is described as follows:
1. QuantityTask 0 - Description of QuantityTask 0
(a) GroupTask 0 - type: assembler; assembler structure: Global State 0 (global-
State0Name)
i. Phenomenon 0 (phenomenon0Name)
A. Quantity 0 - coupling: none; parameters: none
2. QuantityTask 1 - Description of QuantityTask 1
(a) GroupTask 1 - type: assembler; assembler structure: Global State 1 (global-
State1Name)
i. Phenomenon 0 (phenomenon0Name)
A. Quantity 1 - coupling: Phenomenon 0 (phenomenon0Name) Global
State 0 (globalState0Name); parameters: none
One can note that the QuantityTasks are defined in the first level. In the second level,
the GroupTasks of the related QuantityTask are defined. In the third and fourth levels ap-
pear the Phenomenon and the Quantity required by the related GroupTask. The coupling
information of Quantity 1 is defined in the information of the Quantity. It means that
Quantity 1 can only assemble the structure indicated by the GroupTask after Global State
0 had already been assembled.
There are other types of GroupTasks: solver, operator, neummanBC, and essentialBC.
For each type of GroupTask the information can change. In the case of neummanBC or
essentialBC, a Quantity is required to execute (similarly to the assembler type presented).
In the case of solver or operator, some additional information is informed, but a Quantity
is not required to execute.
6.2 Kernel
The kernel represents the overall scenery of the simulation. It is responsible for initializa-
tion of procedures, for global time loops and iterations, for global adaptive iterations, and
73
for articulations of activities to be executed by the blocks in the lower level. The Kernel
stores system data related to the parameters for its loops and iterations.
In kernel code it is necessary the usage of two maps: the algorithm map and the
system data map. The first one associates dierent keys to an algorithm. It is represented
by algorithmMap(x, y), where x identifies the block that will execute the algorithm, and
y is a dierent key for each algorithm that will be executed in the kernel. The second
map associates dierent keys to a single data that comes from the system data. It is
represented by systemDataMap(x), where x is a dierent key for each data considered
from the system data.
6.2.1 System Data
The system data is a structure that is responsible for maintaining the data that aects
directly only the system. Some examples of this data can be data related to time, time
step, some condition. The system data is also useful to keep the kernel and the algorithms
in touch.
6.3 Global States
As explained in Chapter 2, the global states are global data structures such as matrices,
vectors and scalars. They are stored in a group due to the fact that the group is responsible
not only for assembling and solving algebraic systems, but also for operating with the
structures when requested by its block.
6.4 Algorithms
The algorithms request groups to execute quantityTasks, which is the level immediately
after Group in MPhyScaS architecture. All algorithms must have a structure called algo-
rithm data where some information regarding the algorithm can be stored. Besides this,
all algorithms must use six maps in its code, as follows:
Task map- associatesdierent keys to a quantityTask. Represented by taskMap(x, y),
x identifies the group that will execute the quantityTask, and y is a dierent key for
each quantityTask that will be executed in the algorithm;
74
Statemap - associates dierent keysto a global state. Represented by stateMap(x, y),
x identifies the group that will retrieve or update the global state, and y is a dierent
key for each global state considered in the algorithm;
System data map - associates dierent keys to a single data that comes from the
system data. Represented by systemDataMap(x), x is a dierent key for each data
considered from the system data;
Block algorithm map - associates dierent keys to another algorithms. Repre-
sented by blockAlgorithmMap(x), x is a dierent key for each another algorithm
(referenced in the algorithm in execution) that will be executed by the same block
of the algorithm in execution;
Algorithm data map - associates dierent keys to a single data that comes from
the algorithm data of another algorithm. This map can only be nonempty if the
block algorithm map is also nonempty. Represented by algorithmDataMap(x, y),
x identifies the other algorithm from which the algorithm data is considered (the
value here is the same key that appears in block algorithm map), and y is a dierent
key for each data considered of the algorithm data.
These maps are responsible for representing a specific data without knowing its in-
stance. Because of this, it is easier to reuse these algorithms. An example of the us-
age of the task map can be seen in line 1 of Pseudo-Code 5. The actual identifier of a
quantityTask is directly associated to the keys of the map. Therefore, in the example,
taskMap(0, 0) is associated to the QuantityTask 0 of Group 0 according to the task map
table. Some of following algorithms do not present all six maps, which means that the
maps not presented are empty.
6.5 QuantityTasks
Here, the quantityTasks of each group will be presented separately. They will be shown
in a four-level scheme. In the first level is the quantityTask and an explanation of what it
will compute. In the second level are the groupTasks executed by the quantityTask. Note
that, a quantityTask can execute more than one groupTask, hence it is possible to find
more than one item in the second level. In the third level is the phenomenon requested by
75
the groupTask. In this level, only one item can be found as sub item of a groupTask. And,
in the fourth level are the quantities executed by the phenomenon.
The groupTasks that will appear in the second level can be of four types. The first
one is the assembler, this type of groupTask also presents the assembled structure. The
second type is the boundary condition which appears as neummanBC an essentialBC and
it also presents the assembled structure. The third type is the solver and the fourth type
is operator one. All solver and operator groupTasks invoke the group phenomenon to
execute their tasks, while the assembler and boundary condition groupTasks invoke other
phenomenon but the group phenomenon.
When the groupTask is an assembler or a boundary condition one, the quantity must
present information about coupled states and parameters that are necessary for its execu-
tion. When the groupTask is of solver type, the quantity must present three global states
that will compose the algebraic linear system and parameters that are necessary for its
execution. And when the groupTask is of operator type, the quantity must present the
type of the operation, the global states that will compose it and also the parameters of the
operation.
76
Chapter 7
Experiments and Results
This chapter presents the evaluation of the scheduling algorithm proposed here and a com-
parison to the algorithms showed in Chapter 3, reference algorithms of the related works.
For this purpose, we implemented the algorithms and generated two abstract simulators
that are DAGs of processes based on MPhyScaS architecture which was then scheduled
by these algorithms. Besides the abstract simulators, we also use the case study presented
in Chapter 6 to compare the algorithm proposed with the List Scheduling, LPT and SPT
algorithms.
For the experiments, we used combinations among 4 algorithms to be executed, 3
problems to solve (the DAG representing the processes and their dependencies), and 2
architectures where the simulations would run. This represents six scenarios (3x2 = 6
scenarios, for problems and architectures) and twenty four executions (6x4 = 24 execu-
tions, for scenarios and algorithms).
Two architectures were regarded by the algorithms: the first one is composed of 3
homogeneous processors fully connected, while the second one is composed of 6 homo-
geneous processors that are also fully connected. This chapter presents the results for
each algorithm applied to the six scenarios.
For the three dierent algorithms and the proposal genetic algorithm, we run the algo-
rithms 30 times. We calculate the mean and the standard deviation of these simulations.
In list scheduling algorithm we use the MPhyScaS architecture for determining the prior-
ity of the processes. Processes in the same level have the same priority. The Kernel level
has the highest priority and the Quantity level has the lowest priority.
We used the data of ten simulations to estimate the number of samples required, we
77
found that we need about nine samples to calculate the mean value for the fitness func-
tion, assuming a confidence interval of 95%. Based on this estimation, we collected the
arithmetic mean and the standard deviation of each scenario.
For sequential simulation of the problems, the communication cost and the waiting
time functions have their values nulled, and the makespan function has its maximum
value. The results for sequential execution will also be presented for each problem.
In the experiments, we use the values 0.6, 0.3, and 0.1 for the parameters ω
1
, ω
2
, and
ω
3
respectively. These values represents what percentage of each function (makespan,
communication cost, and waiting time respectively) we wanted to take in consideration.
The makespan function is the most important for this work because the total execution
time is what we want to minimize. We also want to minimize the communication cost
and the waiting time, but they are less important than the makespan. The waiting time
function is less important to consider than the communication cost.
The population size is a critical parameter in genetic algorithms. The GA converges to
poor solutions if the population is too small. The GA spends unnecessary computational
resources is the population is excessively large (LOBO; LIMA, 2005). For the proposed
genetic algorithm, the simulations were done for population size equals to 50 and 100 in-
dividuals (based on (MOATTAR; RAHMANI; DERAKHSHI, 2007; KIM, 2007)), using
cross probability equals to 0.9 and the mutation rate equals to 0.1.
7.1 The First Abstract Simulator
The first problem has 50 nodes (processes) and 61 edges (dependencies). These depen-
dencies were found by analising the Petri net model that corresponds to this simulator. A
maximum of 8 levels of nesting is found in dependencies of this graph.
The value obtained using the fitness function for the sequential execution of the first
problem is equal to 1446.6. For the architecture with 3 homogeneous processors, the re-
sults to the list scheduling, LPT and SPT rules are presented in Table 7.1. And Table 7.2
shows the results found by the proposed genetic algorithm for 500, 1000 and 2000 iter-
ations. In both Tables 7.1 and 7.2, the percentage gain obtained based on the sequential
execution is shown for all algorithms. Specifically for the GA the percentage gain is
calculated to the result obtained after 2000 iterations.
78
Table 7.1: Results for list scheduling, LPT and SPT using the first architecture
List Scheduling LPT SPT
Mean 969. 9200 1045.5100 1017.1700
S.D. 36.7211 37.2302 40.0182
Gain (%) 32.9517 27.7264 29.6854
Table 7.2: Results for GA using the first architecture
Fitness
Iterations 50 individuals 100 individuals
500
Mean 842.2633 833.4733
S.D. 31.0516 21.0644
1000
Mean 839.9333 829.4833
S.D. 29.8499 24.1853
2000
Mean 837.4533 828.8533
S.D. 29.3756 24.0546
Gain (%) 42.1088 42.7033
The results in bold font are the best mean fitness values for each iteration considered.
Note that the improvement observed between 1000 and 2000 iterations in both population
size is very small, so that the computation spent for this is not necessary in this example.
The improvement of the proposed algorithm can be monitored in Figure 7.1.
For the architecture with 6 homogeneous processors, the results to the list scheduling,
LPT and SPT rules are presented in Table 7.3. And Table 7.4 shows the results found by
the proposed genetic algorithm for 500, 1000 and 2000 iterations.
Table 7.3: Results for list scheduling, LPT and SPT using the second architecture
List Scheduling LPT SPT
Mean 1066.7400 1104.3200 1132.8300
S.D. 114.8803 120.1449 122.5676
Gain (%) 26.2588 23.6610 21.6902
79
820
840
860
880
900
920
940
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Fitness value
Iterations
50 individuals
100 individuals
Figure 7.1: The convergence of the proposed genetic algorithm for 50 and 100 individuals
using first architecture.
Table 7.4: Results for GA using the second architecture
Fitness
Iterations 50 individuals 100 individuals
500
Mean 707.2600 697.2100
S.D. 37.0818 36.6040
1000
Mean 689.2900 686.1900
S.D. 38.0706 35.5946
2000
Mean 682.0100 680.0800
S.D. 36.3665 32.2251
Gain (%) 52.8543 52.9877
The improvement of the proposed algorithm is depicted in Figure 7.2.
The data presented shows that the population with 100 individuals presents better
schedules than the one with 50 individuals during all iterations. Considering the first
architecture, at the 500th iteration the 100-individual population had already presented
better schedules than the 50-individual population at the end of 2000 iterations. Note that
in both situation the 100-individual population converged faster than the 50-individual
population.
80
700
750
800
850
900
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Fitness value
Iterations
50 individuals
100 individuals
Figure 7.2: The convergence of the proposed genetic algorithm for 50 and 100 individuals
using second architecture.
7.2 The Second Abstract Simulator
The second problem has 126 nodes (processes) and 223 edges (dependencies), which
were found by the Petri net model. This graph has 16 levels of nesting as its maximum.
One can note that this implies the second problem is more dicult to solve than the first
one.
The value obtained using the fitness function for the sequential execution of the sec-
ond problem is equal to 3510.0. For the architecture with 3 homogeneous processors,
the results to the list scheduling, LPT and SPT rules are presented in Table 7.5. And
Table 7.6 shows the results found by the proposed genetic algorithm for 500, 1000 and
2000 iterations. In both Tables 7.5 and 7.6, the percentage gain obtained based on the
sequential execution is shown for all algorithms. Specifically for the GA the percentage
gain is calculated to the result obtained after 2000 iterations.
Table 7.5: Results for list scheduling, LPT and SPT using the first architecture
List Scheduling LPT SPT
Mean 2592.1100 2580.9700 2618.6800
S.D. 157.4361 154.3924 159.9462
Gain (%) 26.1507 26.4681 25.3937
81
Table 7.6: Results for GA using the first architecture
Fitness
Iterations 50 individuals 100 individuals
500
Mean 2447.6730 2445.8930
S.D. 62.3827 61.3071
1000
Mean 2399.9230 2401.9930
S.D. 59.5612 74.9665
2000
Mean 2379.2830 2385.7330
S.D. 63.3114 70.7054
Gain (%) 32.2141 32.0304
The convergence during all iterations of the latest genetic algorithm results presented
can be seen in Figure 7.3.
2350
2400
2450
2500
2550
2600
2650
2700
2750
2800
2850
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Fitness value
Iterations
50 individuals
100 individuals
Figure 7.3: The convergence of the proposed genetic algorithm for 50 and 100 individuals
using the first architecture.
For the second architecture (with 6 homogeneous processors), the results to the list
scheduling, LPT and SPT rules are presented in Table 7.7 and Table 7.8 shows the results
found by the proposed genetic algorithm for 500, 1000 and 2000 iterations.
The convergence during all iterations of the latest genetic algorithm results presented
can be seen in Figure 7.4.
One can note that this second problem is more dicult to solve than the previous one
82
Table 7.7: Results for list scheduling, LPT and SPT using the second architecture
List Scheduling LPT SPT
Mean 2944.8700 2817.8200 2842.7300
S.D. 158.5764 155.0172 160.9108
Gain (%) 16.1006 19.7202 19.0105
Table 7.8: Results for GA using the second architecture
Fitness
Iterations 50 individuals 100 individuals
500
Mean 2254.2800 2242.7870
S.D. 74.0134 62.4255
1000
Mean 2149.6600 2140.3770
S.D. 66.9043 65.4855
2000
Mean 2077.6200 2068.6470
S.D. 67.2494 59.3870
Gain (%) 40.8085 41.0642
so that both population size behaved almost in the same way. For this problem, the fact
of having more individuals did not contribute to the population with 100 individuals due
to the complexity of the problem.
7.3 Case Study
The case study has 150 nodes (processes) and 237 edges (dependencies). These depen-
dencies were found by analising the Petri net model that corresponds to this problem. A
maximum of 12 levels of nesting is found in dependencies of this graph.
The value obtained using the fitness function for the sequential execution of the sec-
ond problem is equal to 5680.8. For the architecture with 3 homogeneous processors,
the results to the list scheduling, LPT and SPT rules are presented in Table 7.9. And
Table 7.10 shows the results found by the proposed genetic algorithm for 500, 1000 and
2000 iterations. In both Tables 7.9 and 7.10, the percentage gain obtained based on the
83
2000
2100
2200
2300
2400
2500
2600
2700
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Fitness value
Iterations
50 individuals
100 individuals
Figure 7.4: The convergence of the proposed genetic algorithm for 50 and 100 individuals
using the second architecture.
sequential execution is shown for all algorithms. Specifically for the GA the percentage
gain is calculated to the result obtained after 2000 iterations.
Table 7.9: Results for list scheduling, LPT and SPT using the first architecture
List Scheduling LPT SPT
Mean 5464.2300 5356.8000 5495.4900
S.D. 256.9011 255.3976 257.7374
Gain (%) 3.8123 5.7034 3.2620
The convergence during all iterations of the latest genetic algorithm results presented
can be seen in Figure 7.5.
For the second architecture (with 6 homogeneous processors), the results to the list
scheduling, LPT and SPT rules are presented in Table 7.11 and Table 7.12 shows the
results found by the proposed genetic algorithm for 500, 1000 and 2000 iterations.
The improvement of the proposed algorithm is depicted in Figure 7.6.
Considering the first architecture, at the 500th iteration the 100-individual population
had already presented better schedules than the 50-individual population at the end of
2000 iterations. Note that in both situation the 100-individual population converged faster
than the 50-individual population.
Looking at the results for the second architecture, one can note that only GA algorithm
84
Table 7.10: Results for GA using the first architecture
Fitness
Iterations 50 individuals 100 individuals
500
Mean 4925.2800 4838.9400
S.D. 101.4804 111.0909
1000
Mean 4877.0000 4796.0000
S.D. 96.3841 117.8795
2000
Mean 4852.7400 4772.9200
S.D. 102.7384 111.5300
Gain (%) 14.5765 15.9816
Table 7.11: Results for list scheduling, LPT and SPT using the second architecture
List Scheduling LPT SPT
Mean 6727.8600 6727.8000 6982.8900
S.D. 259.8337 258.9764 260.4492
Gain (%) 18.4315 18.4305 22.9209
Table 7.12: Results for GA using the second architecture
Fitness
Iterations 50 individuals 100 individuals
500
Mean 5350.2400 5294.6800
S.D. 147.6926 107.5116
1000
Mean 5269.4200 5221.0000
S.D. 148.4255 107.9098
2000
Mean 5232.9200 5181.1800
S.D. 154.3999 106.3545
Gain (%) 7.8841 8.7949
gets some improvement. The other algorithms do not get improvement because of the
communication cost and waiting time which spent more time than the saved one.
85
4750
4800
4850
4900
4950
5000
5050
5100
5150
5200
5250
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Fitness value
Iterations
50 individuals
100 individuals
Figure 7.5: The convergence of the proposed genetic algorithm for 50 and 100 individuals
using the first architecture.
5100
5200
5300
5400
5500
5600
5700
5800
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Fitness value
Iterations
50 individuals
100 individuals
Figure 7.6: The convergence of the proposed genetic algorithm for 50 and 100 individuals
using the second architecture.
86
Chapter 8
Conclusions and Future Works
8.1 Conclusions
This dissertation explores an important factor related to job parallelization: to identify
parallel jobs and scheduling them. Scheduling algorithms explore the job dependencies
in order to define a schedule near to the optimal one. These algorithms consider only the
makespan (total completion time) function to obtain the schedule. This process can be
very complex when considering other eects: the architecture where the simulations will
run; the communication cost between jobs; and the waiting time of each processor.
Considering other eects leads us to create a complex environment composed of great
quantity of variablesthat will be considered. This makes the simple scheduling algorithms
obsolete. In these circumstances, the algorithm proposed in this dissertation was modeled
to be capable of incorporating these eects and of defining a schedule based on the costs
associated to these eects.
The need for identifying the job dependencies, eventually generated a new problem.
A technique based on Coloured Petri Nets (CPN) was used for searching and identifying
the dependencies between jobs. CPN model is used to represent the MPhyScaS simulator
and simulation data. The model definition involves the representation of the MPhyScaS
architecture objects and their relationship. The model is used to capture the causal and
data dependencies between processes of a MPhyScaS simulator. A compiler was devel-
oped to transform the simulator and simulation data into a CP-net that follows the defined
model.
The model definition allowed us to go through all reachable states of the CP-net gen-
87
erated. We used the CPNTools to create the model, generate the reachable state graph and
to analyze this graph to recognize dependencies between processes. These dependencies
will be used by the scheduling algorithms as one of their parameters.
For the scheduling problem, initially it was implemented three dierent scheduling
algorithms: List Scheduling Algorithm; Longest Processing Time; and Shortest Process-
ing Time. These algorithms take in consideration only the makespan function and job
dependencies. The first algorithm prioritize the jobs with highest priority which are given
by the user to determine the schedule. The second one prioritize the jobs with longest
processing time, while the last one, prioritize the jobs with shortest processing time.
Dealing with MPhyScaS problems, other parameters are relevant: the architecture
where the simulations will run; and the communication cost between jobs due to the
large amount of data communicated. Although we could adapted any of the developed
algorithms, we prefer to use another technique. The proposed work suggests that complex
problems involving a lot of parameters (eects) considered in the problem can be solved
using Computational Intelligence (CI) techniques. The study and improvement of these
techniques demonstrated that the biggest challenge is not in the problem itself but in
the art of using these techniques. The biggest challenge is in the capacity of visualizing
what each technique has to oer and use this to contribute in problem solution. Hence, we
decided to develop the Genetic Algorithm (GA) technique. Applying a Genetic Algorithm
in scheduling problems with a well defined hierarchy demonstrates to be very interesting
and valuable.
Due to the complexity of MPhyScaS problem, the usage of serching algorithms as GA
is essential to find a good job schedule to be used. Hence, for each new approach (new
eects to be considered) a new search for a schedule near the optimal should be done.
Though, the algorithm is able to adapt and solve the problem with the same eciency,
even if there are changes in architecture where the simulation will run or if there are new
eects to be considered.
Applying GA to the MPhyScaSproblem, we obtained relevantresults in relation to the
other algorithms implemented. In the application of the GA to the first problem generated,
the results obtained are better than the other three algorithms, even when we decrease the
population size of GA from 100 to 50 individuals. When it was applied to the second
problem generated, the GA obtain equivalent results to the dierent sizes of population
88
(100 and 50 individuals), but better than the other algorithms. To the case study, GA also
obtained better results. The behavior was almost identical for the GA when applied to
dierent sizes of population in the second problem. This was expected due to the fact that
the second problem has a lot of dependencies. This causes the GA does not have many
option of schedule to search.
The fact that the GA be able to incorporate and consider several eects while perform-
ing the job scheduling, makes this technique useful for MPhyScaS. The main reason of
this is the little time spent by the GA to find a schedule (about 2 minutes for the presented
problems) in comparison to the time gain in the simulation. Thus, one can conclude that
the proposed work results in a significant contribution in the areas of system modeling,
parallel processing and scheduling. It contributes also to the area of muti-physics simula-
tion.
8.2 Future Works
If we analyze other works that use Genetic Algorithm, we will note that the way GA
parameters are used has important impact for the algorithm performance. For each GA
iteration, the population passes through selection, part of the population passes through
the crossover task, and another part of population passes through the mutation task. In
the proposed GA algorithm, a study to optimize the parameters related to these tasks
can be performed, thus shortening the time needed to converge to a satisfactory solution
(scheduling).
The parameters used to balance the importance of the eects in the fitness function
may also be optimized. These parameters indicate how much each eect will contribute to
the solution. The optimization of these parametersmay providebetter results. Other Com-
putational Intelligence techniques should be studied to determine the technique which is
the most appropriate for this.
The proposed GA algorithm demonstrates to be able to incorporate several other ef-
fects that influences in searching for a better schedule. Therefore new parameters (eects)
should be incorporated in order to contribute to a better schedule choice. Hence, new ex-
periments involving other eects should be done in order to, analyzing the results, the
quality of the solutions found can be improved.
89
Other simulation settings involving the MPhyScaS problem may be experimented in
order to analyze and extend the application of the GA algorithm. Some simulations can
be done to verify the influence of the variables according to the problems.
Another important point that should be studied is how to measure the cost of each
MPhyScaS job. In case study presented, the measurements of the computation costs used
are done related to some evidences in the simulation, as the mesh size, the nodal di-
mension of the phenomenon. A study should be done in order to obtain more realistic
measures.
90
APPENDIX A
Case Study
A.1 Kernel Algorithm
The pseudo-code of the kernel used in the case study is presented in Pseudo-Code 4. Lines
1 and 2 of Pseudo-Code 4 initialize Block 0 and Block 1 respectively. In line 3, the kernel
initialize time data, computing the initial time step of the simulation.
From line 6 to line 14 the iteration itself occurs until the interval time of the simulation
is reached. Basically, this kernel executes an algorithm, verifies the condition returned,
update time data and store the results of the iteration in order to be post-processed. The
execution of the algorithm occurs in line 7, where the algorithm that does the U iteration
is called. Lines 8 to 11 is the verification of the condition returned by the algorithm
executed. In line 12 an algorithm to update the time data is executed. And then, the
results obtained in the iteration are stored so that the post-processing phenomenon can
present them to the user. The algorithm map and the system data map related to this
kernel are shown in Table A.1 and Table A.2 respectively.
91
Pseudo-Code 4: Kernel algorithm pseudo-code.
block Vector [0] Execute Algorithm(algorithm Map(0,0)) //Initialize1
Block 0
block Vector [1] Execute Algorithm(algorithm Map(1,0)) //Initialize2
Block 1
block Vector [0] Execute Algorithm(algorithm Map(0,1)) //Initialize3
time data. It calls an algorithm for initial time step
computation
double tn = system Data Get Data(system Data Map(0))4
double interval = system Data Get Data(system Data Map(1))5
while tn interval do6
block Vector [0] Execute Algorithm(algorithm Map(0,2))7
//Iteration of U
int condition = system Data Get Data(system Data Map(2))8
//Iteration condition
if condition < 0 then9
return condition10
end11
block Vector [0] Execute Algorithm(algorithm Map(0,3)) //Update12
time data. It calls an algorithm for time step computation
block Vector [1] Execute Algorithm(algorithm Map(1,1)) //Store13
results for post-processing phenomenon
end14
A.2 System Data
The system data of this case study are shown in Table A.3.
A.3 Global States
The global states of this case study are presented in Table A.4 and they are separated by
groups.
92
Table A.1: Algorithm map of Kernel algorithm
Block Algorithm Actual Algorithm
0
0 0
1 7
2 2
3 8
1
0 5
1 6
Table A.2: System data map of Kernel algorithm
Index Data
0 1 (tn)
1 3 (interval)
2 4 (condition)
Table A.3: System Data
Index Data
0 dt
1 tn
2 tnp1
3 interval
4 condition
A.4 Algorithms
A.4.1 Algorithm 0
Algorithm 0 initializes two global states: Un and Cn. Hence, it computes two quanti-
tyTasks. The first one is executed by Group 0 and initializes Un as shown in line 1 of
Pseudo-Code 5. The second quantityTask is executed by Group 1 and initializes Cn as
shownin line 2 of Pseudo-Code 5. The task map of Algorithm 0 is presented in Table A.5.
93
Table A.4: Global States for each Group
Group Code Global State
0
0 Un
1 Unp1
2 Uk
3 Ukp1
4 dU
5 errU
6 gSystemData0
7 trSig
8 K
9 F
20 trS
22 aux1
1
10 Cn
11 Cnp1
12 Ck
13 Ckp1
14 dC
15 errC
16 gSystemData1
17 M
18 G
19 Kc
21 aux0
23 aux2
24 aux3
2 25 gSystemData2
Pseudo-Code 5: Algorithm 0 pseudo-code.
group Vector [0] Compute QuantityTask(task Map(0,0)) //Initialize1
Un
group Vector [1] Compute QuantityTask(task Map(1,0)) //Initialize2
Cn
94
Table A.5: Task map of Algorithm 0
Group QuantityTask Actual QuantityTask
0 0 0
1 0 0
A.4.2 Algorithm 1
Algorithm 1 computes initial time step and update system group’s time data in global
state gS ystemData2. Lines 1 to 3 of Pseudo-Code 6 computes minimum time step at
each group. In this case, these groups are Group 0 and Group 1. In line 4 of the same
algorithm, a quantityTask is executed to compute the minimum time step among these
groups. After that, the system data is updated with the value of the time step stored in
global state gSystemData2, as can be seen in lines 5 and 6 of Pseudo-Code 6.
The task map, state map and system data map of Algorithm 1 are presented in Ta-
ble A.6, Table A.7 and Table A.8 respectively.
Pseudo-Code 6: Algorithm 1 pseudo-code.
for int i = 0; i < 2; i + + do1
grpVec[i] Compute QuantityTask(task Map(i,0)) //Compute minimum2
time step at each group (Group 0, Group 1)
end3
group Vector [2] Compute QuantityTask(task Map(2,0)) //Compute4
minimum time step among Groups 0, 1. There is no modification
on time step data for other groups but the SystemGroup
double[] time Data = group Vector [2] Retrieve State(state Map(2,0))5
//Retrieve time data vector
system Data Update(system Data Map(0),time Data [0]) //Updates time6
step in System Data
A.4.3 Algorithm 2
Algorithm 2 represents U iteration. Lines 1 and 2 of Pseudo-Code 7 initialize vector
fields for U iteration and Picard iteration respectively. In line 5, Algorithm 2 informs to
95
Table A.6: Task map of Algorithm 1
Group QuantityTask Actual QuantityTask
0 0 1
1 0 1
2 0 0
Table A.7: State map of Algorithm 1
Group State Actual State
2 0 25 (gSystemData2)
Table A.8: System data map of Algorithm 1
Index Data
0 0 (dt)
the algorithm it executes not to initialize its vector fields. This is because Algorithm 2 has
already initialize them (line 2).
From line 1 to line 15 of Pseudo-Code 8 the iteration itself occurs until errU satisfies
the tolerance or the number of iterations reaches the maximum value permitted. Lines 2
to 4 solves the algebric linear system, where line 2 assembles the matrix and the strength
vector, line 3 introduces boundary conditions and line 4 solves the system, computing
Ukp1. The trace of the strain tensor is computed in line 5. Algorithm 2 executes another
algorithm in line 6. If this other algorithm returns a value less than zero, then Algorithm
2 returns this value (line 9) because it is an error. If the returned value does not represent
an error, Algorithm 2 computes errU in order to know if the iteration will continue.
When the iteration finishes, Algorithm 2 verifies if the number of executed iterations
is the maximum. If it occurs, the algorithm did not converge so that an error is returned
(lines 17 and 18). If it does not occur, the algorithm converged successfully. In lines 21
and 22 Unp1 and Cnp1 are updated for the next execution.
The task map, state map, system data map, algorithm data, block algorithm map and
algorithm data map of Algorithm 2 are presented in Table A.9, Table A.10, Table A.11,
Table A.12, Table A.13 and Table A.14 respectively.
96
Pseudo-Code 7: Algorithm 2 pseudo-code (first part).
group Vector [0] Compute QuantityTask(task Map(0,0)) //Initialize1
vector fields for iteration of U. Uk = Un
group Vector [1] Compute QuantityTask(task Map(1,0)) //Initialize2
vector fields for piccard iteration
double tnp1 = system Data Get Data(system Data Map(0))3
double errU = 14
block Algorithm Map(0) Set Algorithm Data(algorithm Data Map(0,0),0)5
//Not perform initialization
double tolU = algorithm Data [0]6
int maximum Iterations = algorithm Data [1]7
int iteration = 08
97
Pseudo-Code 8: Algorithm 2 pseudo-code (second part).
while errU > tolU && iteration < maximum Iterations do1
group Vector [0] Compute QuantityTask(task Map(0,1)) //Assembles2
matrix and strength vector. F needs Ckp1, tnp1.
group Vector [0] Compute QuantityTask(task Map(0,2))3
//Introduces boundary conditions
group Vector [0] Compute QuantityTask(task Map(0,3)) // Solves4
the system. Computes Ukp1
group Vector [0] Compute QuantityTask(task Map(0,4)) // Computes5
tr(σ)
this Block Execute Algorithm(block Algorithm Map(0)) //Piccard6
iteration. Computes Ckp1
int cond = block Algorithm Map(0) Get Algorithm Data(algorithm Data7
Map(0,1)) //Retrieves piccard result
if cond < 0 then8
return cond9
end10
group Vector [0] Compute QuantityTask(task Map(0,5)) //Computes11
errU. errU = Ukp1 Uk/Ukp1
errU = group Vector [0] Retrieve State(state Map(0,0))12
group Vector [0] Compute QuantityTask(task Map(0,6)) //Uk = Ukp113
iteration ++14
end15
if iteration maximum Iterations then16
algorithm Data [2] = failure Codes [0]17
return 018
end19
algorithm Data [2] = success Codes [0]20
group Vector [0] Compute QuantityTask(task Map(0,7)) //Unp1 = Uk21
group Vector [1] Compute QuantityTask(task Map(1,1)) //Cnp1 = Ck22
return 123
98
Table A.9: Task map of Algorithm 2
Group QuantityTask Actual QuantityTask
0
0 2
1 3
2 4
3 5
4 6
5 7
6 10
7 11
1
0 2
1 10
Table A.10: State map of Algorithm 2
Group State Actual State
0
0 5 (errU)
1 2 (Uk)
2 3 (Ukp1)
3 1 (Unp1)
1
0 11 (Cnp1)
1 12 (Ck)
Table A.11: System data map of Algorithm 2
Index Data
0 2 (tnp1)
A.4.4 Algorithm 3
Algorithm 3 represents Picard iteration. Lines 1 to 3 of Pseudo-Code 9 initialize vector
fields if it was set. In this case, the vector fields are not initialized because they were
initialized by Algorithm 2.
From line 11 to line 21 the iteration itself occurs until errC satisfies the tolerance
99
Table A.12: Algorithm data of Algorithm 2
Index Data
0 tolU
1 maximum iterations
2 failure/success code
Table A.13: Block algorithm map of Algorithm 2
Index Algorithm
0 3
Table A.14: Algorithm data map of Algorithm 2
Algorithm Index Actual Index
0
0 3 (initialization)
1 2 (condition)
or the number of iterations reaches the maximum value permitted. Line 12 shows the
quantityTask that will compute de trace of S. Lines 13 to 15 solves the algebric linear
system, where line 13 assembles the matrix and the strength vector, line 14 introduces
boundary conditions and line 15 solves the system, computing Ckp1. Lines 16 and 17
prepare data for errC computation (line 18) in order to know if the iteration will continue.
When the iteration finishes, Algorithm 3 verifies if the number of executed iterations is
the maximum. If it occurs, the algorithm did not converge so that an erroris returned (lines
2 and 3 of Pseudo-Code 10). If it does not occur, the algorithm converged successfully
(lines 6 and 7). In line 5 Cnp1 are updated for the next execution.
The task map, state map, system data map and algorithm data of Algorithm 3 are
presented in Table A.15, Table A.16, Table A.17 and Table A.18 respectively.
100
Pseudo-Code 9: Algorithm 3 pseudo-code.
if algorithm Data [3] then1
groupVector [1] Compute QuantityTask(task Map(1,0)) //Ck = Ckp12
end3
double dt = system Data Get Data(system Data Map(0))4
double tnp1 = system Data Get Data(system Data Map(1))5
double tn = tnp1 dt6
double tolC = algorithm Data [0]7
int maximum Iterations = algorithm Data [1]8
double errC = 19
int iteration = 010
while errC > tolC && iteration < maximum Iterations do11
groupVector [0] Compute QuantityTask(task Map(0,0)) //Computes12
tr(S). trS needs trSig, Ck and the parameters C0, e
groupVector [1] Compute QuantityTask(task Map(1,1)) //Assembles13
matrix and strength vector. M needs trSig, Ck, dt. G needs
dt, Cn
groupVector [1] Compute QuantityTask(task Map(1,2)) //Introduces14
boundary conditions
groupVector [1] Compute QuantityTask(task Map(1,3)) //Solves the15
system. Computes Ckp1
groupVector [1] Compute QuantityTask(task Map(1,4)) //Prepare16
data for residue computation. RC = M · Ckp1 G
groupVector [1] Compute QuantityTask(task Map(1,2)) //Introduces17
boundary conditions
groupVector [1] Compute QuantityTask(task Map(1,5)) //Computes18
errC. errC = max{RC, Ckp1 Ck}/Ckp1
groupVector [1] Compute QuantityTask(task Map(1,0)) //Ck = Ckp119
iteration ++20
end21
101
Pseudo-Code 10: Algorithm 3 pseudo-code.
if iteration maximum Iterations then1
algorithm Data [2] = failure Codes [0]2
return 03
end4
groupVector [1] Compute QuantityTask(task Map(1,6)) //Cnp1 = Ckp15
algorithm Data [2] = success Codes [0]6
return 17
Table A.15: Task map of Algorithm 3
Group QuantityTask Actual QuantityTask
0 0 12
1
0 11
1 3
2 4
3 5
4 6
5 7
6 10
Table A.16: State map of Algorithm 3
Group State Actual State
1
0 12 (Ck)
1 13 (Ckp1)
2 11 (Cnp1)
A.4.5 Algorithm 4
Algorithm 4 computes minimum time step among the groups for each iteration loop and
updates the group system’s time data in its gSystemData. The pseudo-code of this algo-
rithm is shown in Pseudo-Code 11.
102
Table A.17: System data map of Algorithm 3
Index Data
0 0 (dt)
1 2 (tnp1)
Table A.18: Algorithm data of Algorithm 3
Index Data
0 tolC
1 maximum iterations
2 failure/success code
3 initialization
In lines 1 to 3 each group computes its time step. In line 4, the system group (Group
2 for this case study) computes the minimum time step among all other groups. One can
note that this task does not modify the time step data for other groups but the system
group. The time step value is updated in the system data (line 6).
The task map, state map and system data map of Algorithm 4 are presented in Ta-
ble A.19, Table A.20 and Table A.21 respectively.
Pseudo-Code 11: Algorithm 4 pseudo-code.
for int i = 0; i < 2; i + + do1
group Vector [i] Compute QuantityTask(task Map(i,0)) //Computes2
minimum time step at each group (Group 0, Group 1)
end3
group Vector [2] Compute QuantityTask(task Map(2,0)) // Computes4
minimum time step among Groups 0, 1. There is no modification
on time step data for other groups but SystemGroup
double[] time Data = group Vector [2] Retrieve State(state Map(2,0))5
//Retrieves time data vector
system Data Update(system Data Map(0),time Data [0]) //Updates time6
step in System Data
103
Table A.19: Task map of Algorithm 4
Group QuantityTask Actual QuantityTask
0 0 8
1 0 8
2 0 0
Table A.20: State map of Algorithm 4
Group State Actual State
2 0 25 (gSystemData2)
Table A.21: System data map of Algorithm 4
Index Data
0 0 (dt)
A.4.6 Algorithm 5
Algorithm 5 retrieves initial data from the simulation in order to be post-processed and
presented to the user. Algorithm 5 (pseudo-code presented in Pseudo-Code 12) consists
in the execution of only one QuantityTask (line 1 of Pseudo-Code 12). This algorithm has
only the task map which is presented in Table A.22.
Pseudo-Code 12: Algorithm 5 pseudo-code.
group Vector [2] Compute QuantityTask(task Map(2,0)) //Retrieves1
initial data from the simulation
Table A.22: Task map of Algorithm 5
Group QuantityTask Actual QuantityTask
2 0 0
104
A.4.7 Algorithm 6
Algorithm 6 retrieves data during the simulation in order to be post-processed and pre-
sented to the user. Because of this algorithm the user can monitor the evolution of sim-
ulation data. Algorithm 6 (pseudo-code presented in Pseudo-Code 13) consists in the
execution of only one QuantityTask (line 1 of Pseudo-Code 13). This algorithm has only
the task map which is presented in Table A.23.
Pseudo-Code 13: Algorithm 6 pseudo-code.
group Vector [2] Compute QuantityTask(task Map(2,0)) //Retrieves1
data during the simulation
Table A.23: Task map of Algorithm 6
Group QuantityTask Actual QuantityTask
2 0 1
A.4.8 Algorithm 7
Algorithm 7 initializes time data in each group using data from system group. First,
the system group (Group 2) initializes time data. In other words, it set the values of tn,
tnp1 and dt equals to zero (line 1 of Pseudo-Code 14). Then, line 2 shows that another
algorithm is called in order to obtain the minimum time step amongthe groups. All groups
but the system group updates its time data with the value obtained by the other algorithm
(lines 3 to 5) and the system data is updated with the same value (ines 7 to 9).
The task map, state map, system data map and block algorithm map of Algorithm 7
are presented in Table A.24, Table A.25, Table A.26 and Table A.27 respectively.
105
Pseudo-Code 14: Algorithm 7 pseudo-code.
group Vector [2] Compute QuantityTask(task Map(2,0)) //Initialize1
all time data. dt = tn = tnp1 = 0
this Block Execute Algorithm(block Algorithm Map(0)) //Calls an2
algorithm for initial time step computation
for int i = 0; i < 2; i + + do3
group Vector [i] Compute QuantityTask(task Map(i,0)) //Updates4
time data for Groups 0, 1 from SystemGroup
end5
double[] time Data = group Vector [2] Retrieve State(state Map(2,0))6
system Data Update(system Data Map(0),time Data [0]) //dt7
system Data Update(system Data Map(1),time Data [1]) //tn8
system Data Update(system Data Map(2),time Data [2]) //tnp19
Table A.24: Task map of Algorithm 7
Group QuantityTask Actual QuantityTask
0 0 9
1 0 9
2 0 2
Table A.25: State map of Algorithm 7
Group State Actual State
2 0 25 (gSystemData2)
Table A.26: System data map of Algorithm 7
Index Data
0 0 (dt)
1 1 (tn)
2 2 (tnp1)
106
Table A.27: Block algorithm map of Algorithm 7
Index Algorithm
0 1
A.4.9 Algorithm 8
Algorithm 8 updates time data in each group using data from system group. First, another
algorithm is called in order to obtain the minimum time step among the groups (line 1 of
Pseudo-Code 15). Then, line 2 shows that the system group (Group 2) updates its time
data. All groups but the system group updates its time data with the value of system
group’s time data (lines 3 to 5) and the system data is updated with the same value (ines
7 to 9).
The task map, state map, system data map and block algorithm map of Algorithm 8
are presented in Table A.28, Table A.29, Table A.30 and Table A.31 respectively.
Pseudo-Code 15: Algorithm 8 pseudo-code.
this Block Execute Algorithm(block Algorithm Map(0)) //Calls an1
algorithm for time step computation. Final time step is in
SystemGroup only
group Vector [2] Compute QuantityTask(task Map(2,0)) //Updates time2
data for SystemGroup
for int i = 0; i < 2; i + + do3
group Vector [i] Compute QuantityTask(task Map(i,0)) //Updates4
time data for Groups 0, 1 from SystemGroup
end5
double[] time Data = group Vector [2] Retrieve State(state Map(2,0))6
system Data Update(system Data Map(0),time Data [0]) //dt7
system Data Update(system Data Map(1),time Data [1]) //tn8
system Data Update(system Data Map(2),time Data [2]) //tnp19
107
Table A.28: Task map of Algorithm 8
Group QuantityTask Actual QuantityTask
0 0 9
1 0 9
2 0 1
Table A.29: State map of Algorithm 8
Group State Actual State
2 0 25 (gSystemData2)
Table A.30: System data map of Algorithm 8
Index Data
0 0 (dt)
1 1 (tn)
2 2 (tnp1)
Table A.31: Block algorithm map of Algorithm 8
Index Algorithm
0 4
A.5 QuantityTasks
A.5.1 QuantityTasks for Group 0
1. QuantityTask 0 - Initialize Un
(a) GroupTask 0 - type: assembler; assembler structure: Global State 0 (Un)
i. Phenomenon 1 (Elasticity)
A. Quantity 0 - coupling: none; parameters: none
2. QuantityTask 1 - Computes intial dt of this group
(a) GroupTask 1 - type: assembler; assembler structure: Global State 6 (gSys-
108
temData0)
i. Phenomenon 1 (Elasticity)
A. Quantity 1 - coupling: none; parameters: none
3. QuantityTask 2 - Initialize Uk for global solution iteration
(a) GroupTask 2 - type: operator
i. Phenomenon 0 (groupPhenomenon0)
A. Data - operation type: swap; operation structures: Global State 2
(Uk), Global State 0 (Un); parameters: none
4. QuantityTask 3 - Assembles matrix K and strength vector F for solving Ukp1
(a) GroupTask 3 - type: assembler; assembler structure: Global State 8 (K)
i. Phenomenon 1 (Elasticity)
A. Quantity 2 - coupling: none; parameters: none
(b) GroupTask 4 - type: assembler; assembler structure: Global State 9 (F)
i. Phenomenon 1 (Elasticity)
A. Quantity 3 - coupling: Phenomenon 0 (groupPhenomenon0) Global
State 6 (gSystemData0); parameters: none
B. Quantity 4 - coupling: Phenomenon 4 (Diusion) Global State 12
(Ck); parameters: none
5. QuantityTask 4 - Introduces boundary conditions, modifying K and F
(a) GroupTask 5 - type: neummanBC; assembler structure: Global State 9 (F)
i. Phenomenon 1 (Elasticity)
A. Quantity5.a - coupling: Phenomenon 0 (groupPhenomenon0) Global
State 6 (gSystemData0); parameters: Neumman condition = 0 (com-
plete Neumman condition)
B. Quantity5.b - coupling: Phenomenon 0 (groupPhenomenon0)Global
State 6(gSystemData0); parameters: Neumman condition = 1(Neum-
man condition in tangential direction)
109
C. Quantity5.c - coupling: Phenomenon 0 (groupPhenomenon0) Global
State 6(gSystemData0); parameters: Neumman condition = 2(Neum-
man condition in normal direction)
(b) GroupTask 6 - type: essentialBC; assembler structure: Global State 8 (K),
Global State 9 (F)
i. Phenomenon 1 (Elasticity)
A. Quantity6.a - coupling: Phenomenon 0 (groupPhenomenon0) Global
State 6 (gSystemData0); parameters: Dirichlet condition = 0 (com-
plete Dirichlet condition)
B. Quantity6.b - coupling: Phenomenon 0 (groupPhenomenon0)Global
State 6 (gSystemData0); parameters: Dirichlet condition = 1 (Dirich-
let condition in tangential direction)
C. Quantity6.c - coupling: Phenomenon 0 (groupPhenomenon0) Global
State 6 (gSystemData0); parameters: Dirichlet condition = 2 (Dirich-
let condition in normal direction)
6. QuantityTask 5 - Computes Ukp1 using matrix K and strength vector F
(a) GroupTask 7 - type: solver
i. Phenomenon 0 (groupPhenomenon0)
A. Data - operation type: solver; operation structures: Global State 8
(K), Global State 9 (F), Global State 3 (Ukp1); parameters: depends
on the solver
7. QuantityTask 6 - Computes trSig (trace of the strain tensor)
(a) GroupTask 8 - type: assembler; assembler structure: Global State 7 (trSig)
i. Phenomenon 1 (Elasticity)
A. Quantity 7 - coupling: Phenomenon 1 (Elasticity) Global State 3
(Ukp1); parameters: none
8. QuantityTask 7 - Computes errU
(a) GroupTask 9 - type: operator
110
i. Phenomenon 0 (groupPhenomenon0)
A. Data - operation type: subtract; operation structures: Global State 2
(Uk), Global State 3 (Ukp1), Global State 2 (Uk); parameters: none
(b) GroupTask 10 - type: operator
i. Phenomenon 0 (groupPhenomenon0)
A. Data - operation type: vecNorm2; operation structures: Global State
5 (errU), Global State 2 (Uk); parameters: none
(c) GroupTask 11 - type: operator
i. Phenomenon 0 (groupPhenomenon0)
A. Data - operation type: vecNorm2; operation structures: Global State
22 (aux1), Global State 3 (Ukp1); parameters: none
(d) GroupTask 12 - type: operator
i. Phenomenon 0 (groupPhenomenon0)
A. Data - operation type: scalarDivide; operation structures: Global
State 5 (errU), Global State 5 (errU), Global State 22 (aux1); pa-
rameters: none
9. QuantityTask 8 - Computes dt of this group after the beginning
(a) GroupTask 13 - type: assembler; assembler structure: Global State 6 (gSys-
temData0)
i. Phenomenon 1 (Elasticity)
A. Quantity 8 - coupling: none; parameters: none
10. QuantityTask 9 - Updates gSystemData0
(a) GroupTask 14 - type: assembler; assembler structure: Global State 6 (gSys-
temData0)
i. Phenomenon 1 (Elasticity)
A. Quantity 9 - coupling: Phenomenon 4 (groupPhenomenon2) Global
State 25 (gSystemData2); parameters: none
11. QuantityTask 10 - Updates Uk value for the next iteration
111
(a) GroupTask 15 - type: operator
i. Phenomenon 0 (groupPhenomenon0)
A. Data - operation type: swap; operation structures: Global State 2
(Uk), Global State 3 (Ukp1); parameters: none
12. QuantityTask 11 - Updates Unp1 value for the next iteration
(a) GroupTask 16 - type: operator
i. Phenomenon 0 (groupPhenomenon0)
A. Data - operation type: swap; operation structures: Global State 1
(Unp1), Global State 2 (Uk); parameters: none
13. QuantityTask 12 - Computes trS
(a) GroupTask 17 - type: assembler; assembler structure: Global State 20 (trS)
i. Phenomenon 1 (Elasticity)
A. Quantity 10 - coupling: Phenomenon 1 (Elasticity) Global State 7
(trSig); parameters: none
A.5.2 QuantityTasks for Group 1
1. QuantityTask 0 - Initialize Cn
(a) GroupTask 0 - type: assembler; assembler structure: Global State 10 (Cn)
i. Phenomenon 3 (Diusion)
A. Quantity 0 - coupling: none; parameters: none
2. QuantityTask 1 - Computes initial dt of this group
(a) GroupTask 1 - type: assembler; assembler structure: Global State 16 (gSys-
temData1)
i. Phenomenon 3 (Diusion)
A. Quantity 1 - coupling: none; parameters: none
3. QuantityTask 2 - Initialize Ck
(a) GroupTask 2 - type: operator
112
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: swap; operation structures: Global State 12
(Ck), Global State 10 (Cn); parameters: none
4. QuantityTask 3 - Assembles matrix M and strength vector G for solving Ckp1
(a) GroupTask 3 - type: assembler; assembler structure: Global State 17 (M)
i. Phenomenon 3 (Diusion)
A. Quantity 2 - coupling: none; parameters: none
(b) GroupTask 4 - type: assembler; assembler structure: Global State 20 (trS)
i. Phenomenon 3 (Diusion)
A. Quantity 3 - coupling: Phenomenon 1 (Elasticity) Global State 7
(trSig), Phenomenon 3 (Diusion) Global State 12 (Ck); parameters:
none
(c) GroupTask 5 - type: assembler; assembler structure: Global State 19 (Kc)
i. Phenomenon 3 (Diusion)
A. Quantity 4 - coupling: none; parameters: none
B. Quantity 5 - coupling: Phenomenon 3 (Diusion) Global State 20
(trS); parameters: none
(d) GroupTask 6 - type: assembler; assembler structure: Global State 18 (G)
i. Phenomenon 3 (Diusion)
A. Quantity 6 - coupling: Phenomenon 2 (groupPhenomenon1) Global
State 16 (gSystemData1); parameters: none
(e) GroupTask 7 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: matVecMult; operation structures: Global
State 21 (aux0), Global State 17 (M), Global State 10 (Cn); parame-
ters: multiplication factor = Global State 16 (gSystemData1), inverse
= true
(f) GroupTask 8 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
113
A. Data - operation type: addVec; operation structures: Global State 18
(G), Global State 18 (G), Global State 21 (aux0); parameters: none
(g) GroupTask 9 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: matLinearComb; operation structures: Global
State 17 (M), Global State 17 (M), Global State 19 (Kc); parameters:
first parameter = Global State 16 (gSystemData1), inverse = true,
second parameter = 1, inverse = false
5. QuantityTask 4 - Introduces boundary conditions, modifying M and G
(a) GroupTask 10 - type: neummanBC; assembler structure: Global State 18 (G)
i. Phenomenon 3 (Diusion)
A. Quantity 7 - coupling: Phenomenon 2 (groupPhenomenon1) Global
State 16 (gSystemData1); parameters: none
(b) GroupTask 11 - type: essentialBC; assembler structure: Global State 17 (M),
Global State 18 (G)
i. Phenomenon 3 (Diusion)
A. Quantity 8 - coupling: Phenomenon 2 (groupPhenomenon1) Global
State 16 (gSystemData1); parameters: none
6. QuantityTask 5 - Computes Ckp1 using matrix M and strength vector G
(a) GroupTask 12 - type: solver
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: solver; operation structures: Global State 17
(M), Global State 18 (G), Global State 13 (Ckp1); parameters: de-
pends on the solver
7. QuantityTask 6 - Prepares data for residue computation
(a) GroupTask 3 - type: assembler; assembler structure: Global State 17 (M)
i. Phenomenon 3 (Diusion)
A. Quantity 2 - coupling: none; parameters: none
114
(b) GroupTask 13 - type: assembler; assembler structure: Global State 20 (trS)
i. Phenomenon 3 (Diusion)
A. Quantity 9 - coupling: Phenomenon 1 (Elasticity) Global State 7 (tr-
Sig), Phenomenon 3 (Diusion) Global State 13 (Ckp1); parameters:
none
(c) GroupTask 5 - type: assembler; assembler structure: Global State 19 (Kc)
i. Phenomenon 3 (Diusion)
A. Quantity 4 - coupling: none; parameters: none
B. Quantity 5 - coupling: Phenomenon 3 (Diusion) Global State 20
(trS); parameters: none
(d) GroupTask 6 - type: assembler; assembler structure: Global State 18 (G)
i. Phenomenon 3 (Diusion)
A. Quantity 6 - coupling: Phenomenon 2 (groupPhenomenon1) Global
State 16 (gSystemData1); parameters: none
(e) GroupTask 7 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: matVecMult; operation structures: Global
State 21 (aux0), Global State 17 (M), Global State 10 (Cn); parame-
ters: multiplication factor: Global State 16 (gSystemData1), inverse
= true
(f) GroupTask 8 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: addVec; operation structures: Global State 18
(G), Global State 18 (G), Global State 21 (aux0); parameters: none
(g) GroupTask 9 - type: operator
i. Phenomenon 2 (grouPhenomenon1)
A. Data - operation type: matLinearComb; operation structures: Global
State 17 (M), Global State 17 (M), Global State 19 (Kc); parameters:
first parameter = Global State 16 (gSystemData1), inverse = true,
second parameter = 1, inverse = false
115
8. QuantityTask 4 - Introduces boundary conditions, modifying M and G
(a) GroupTask 10 - type: neummanBC; assembler structure: Global State 18 (G)
i. Phenomenon 3 (Diusion)
A. Quantity 7 - coupling: Phenomenon 2 (groupPhenomenon1) Global
State 16 (gSystemData1); parameters: none
(b) GroupTask 11 - type: essentialBC; assembler structure: Global State 17 (M),
Global State 18 (G)
i. Phenomenon 3 (Diusion)
A. Quantity 8 - coupling: Phenomenon 2 (groupPhenomenon1) Global
State 16 (gSystemData1); parameters: none
9. QuantityTask 7 - Computes errC
(a) GroupTask 14 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: matVecMult; operation structures: Global
State 21 (aux0), Global State 17 (M), Global State 13 (Ckp1); pa-
rameters: none
(b) GroupTask 15 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: vecSubtract; operation structures: Global
State 21 (aux0), Global State 21 (aux0), Global State 18 (G); pa-
rameters: none
(c) GroupTask 16 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: vecNorm2; operation structures: Global State
24 (aux3), Global State 21 (aux0); parameters: none
(d) GroupTask 17 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
116
A. Data - operation type: subtract; operation structures: Global State
12 (Ck), Global State 13 (Ckp1), Global State 12 (Ck); parameters:
none
(e) GroupTask 18 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: vecNorm2; operation structures: Global State
15 (errC), Global State 12 (Ck); parameters: none
(f) GroupTask 19 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: vecNorm2; operation structures: Global State
23 (aux2), Global State 13 (Ckp1); parameters: none
(g) GroupTask 20 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: scalarMax; operation structures: Global State
15 (errC), Global State 15 (errC), Global State 24 (aux3); parameters:
multiplication factor = Global State 23 (aux2), inverse = true
10. QuantityTask 8 - Computes dt of this group after the beginning
(a) GroupTask 21 - type: assembler; assembler structure: Global State 16 (gSys-
temData1)
i. Phenomenon 3 (Diusion)
A. Quantity 10 - coupling: none; parameters: none
11. QuantityTask 9 - Updates gSystemData1
(a) GroupTask 22 - type: assembler; assembler structure: Global State 16 (gSys-
temData1)
i. Phenomenon 3 (Diusion)
A. Quantity11 - coupling: Phenomenon 4 (groupPhenomenon2) Global
State 25 (gSystemData2); parameters: none
12. QuantityTask 10 - Updates Cnp1 value for the next iteration
117
(a) GroupTask 23 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: swap; operation structures: Global State 11
(Cnp1), Global State 12 (Ck); parameters: none
13. QuantityTask 11 - Updates Ck value for the next iteration
(a) GroupTask 24 - type: operator
i. Phenomenon 2 (groupPhenomenon1)
A. Data - operation type: swap; operation structures: Global State 12
(Ck), Global State 13 (Ckp1); parameters: none
A.5.3 QuantityTasks for Group 2
1. QuantityTask 0 - Computes minimum time step
(a) GroupTask 0 - type: assembler; assembler structure: Global State 25 (gSys-
temData2)
i. Phenomenon 4 (groupPhenomenon2)
A. Quantity 0 - coupling: Phenomenon 0 (groupPhenomenon0) Global
State 6(gSystemData0), Phenomenon 2 (groupPhenomenon1) Global
State 16 (gSystemData1); parameters: none
2. QuantityTask 1 - Updates time data for next time step computation
(a) GroupTask 1 - type: assembler; assembler structure: Global State 25 (gSys-
temData2)
i. Phenomenon 4 (groupPhenomenon2)
A. Quantity 1 - coupling: none; parameters: none
3. QuantityTask 2 - Initialize time data
(a) GroupTask 2 - type: assembler; assembler structure: Global State 25 (gSys-
temData2)
i. Phenomenon 4 (groupPhenomenon2)
A. Quantity 2 - coupling: none; parameters: none
118
A.5.4 QuantityTasks for Group 3
1. QuantityTask 0 - Retrieves initial data from the simulation
(a) GroupTask 0 - type: posProc
i. Phenomenon 6 (PostProcessing)
A. Quantity 0 - coupling: Phenomenon 1 (Elasticity); parameters: none
B. Quantity 1 - coupling: Phenomenon 3 (Diusion); parameters: none
C. Quantity 2- coupling: Phenomenon 4(groupPhenomenon2); param-
eters: none
2. QuantityTask 1 - Retrieves data during the simulation
(a) GroupTask 0 - type: posProc
i. Phenomenon 6 (PostProcessing)
A. Quantity 0 - coupling: Phenomenon 1 (Elasticity); parameters: none
B. Quantity 1 - coupling: Phenomenon 3 (Diusion); parameters: none
C. Quantity 2- coupling: Phenomenon 4(groupPhenomenon2); param-
eters: none
119
References
AGHA, G. A.; DE CINDIO, F.; ROZENBERG, G. Concurrent object-oriented pro-
gramming and petri nets: advances in petri nets. Secaucus, NJ, USA: Springer-Verlag
New York, Inc., 2001.
AIDA, K.; CASANOVA, H. Scheduling mixed-parallel applications with advance reser-
vations. In: HPDC ’08: PROCEEDINGS OF THE 17TH INTERNATIONAL SYMPO-
SIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 2008, New York,
NY, USA. Anais... ACM, 2008. p.65–74.
ALTENDORFER, K.; KABELKA, B.; STOCHER, W. A new dispatching rule for
optimizing machine utilization at a semiconductor test field. Advanced Semiconduc-
tor Manufacturing Conference, 2007. ASMC 2007. IEEE/SEMI, [S.l.], p.188–193,
June 2007.
BAGCHI, T.P. Multiobjective Scheduling by Genetic Algorithms.Norwell, MA, USA:
Kluwer Academic Publishers, 1999.
BANZHAF, W.; NORDIN, P.; KELLER, R.; FRANCONE, F. Genetic Programming -
An Introduction. San Francisco, CA: Morgan Kaufmann, 1998.
BARESEL, A.; BINKLEY, D.; HARMAN, M.; KOREL, B. Evolutionary testing in the
presence of loop-assigned flags: a testability transformation approach. SIGSOFT Softw.
Eng. Notes, New York, NY, USA, v.29, n.4, p.108–118, 2004.
BARESEL, A.; STHAMER, H.; SCHMIDT, M. Fitness Function Design To Improve
Evolutionary Structural Testing. In: GECCO ’02: PROCEEDINGS OF THE GENETIC
AND EVOLUTIONARY COMPUTATION CONFERENCE, 2002, San Francisco, CA,
USA. Anais... Morgan Kaufmann Publishers Inc., 2002. p.1329–1336.
120
BEYER, H.-G. The theory of evolution strategies. New York, NY, USA: Springer-
Verlag New York, Inc., 2001.
BLAZEWICZ, J.; ECKER, K.; PESCH, E.; SCHMIDT, G.; WEGLARZ, J. Handbook
on Scheduling: models and methods for advanced planning (international handbooks on
information systems). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2007.
BOBBIO, A. Computational restrictions for SPN with generally distributed transition
times. In: FIRST EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC-
1), LECTURE NOTES IN COMPUTER SCIENCE, 1994. Anais... Springer-Verlag,
1994. p.131–148.
BRUCKER, P. Scheduling Algorithms. Secaucus, NJ, USA: Springer-Verlag New York,
Inc., 2006.
CHAJED, D.; LOWE, T. J. Building Intuition: insights from basic operations manage-
ment models and principles. [S.l.]: Springer, 2008.
CHEN, W.-H.; LIN, C.-S. A hybrid heuristic to solve a task allocation problem. Comput.
Oper. Res., Oxford, UK, UK, v.27, n.3, p.287–303, 2000.
CHONG, C. S.; SIVAKUMAR, A. I.; LOW, M. Y. H.; GAY, K. L. A bee colony op-
timization algorithm to job shop scheduling. In: WSC ’06: PROCEEDINGS OF THE
38TH CONFERENCE ON WINTER SIMULATION, 2006. Anais... Winter Simulation
Conference, 2006. p.1954–1961.
COLEY, D. A. An Introduction to Genetic Algorithms for Scientists and Engineers.
River Edge, NJ, USA: World Scientific Publishing Co., Inc., 1998.
CPNGROUP. CPNTools: computer tool for coloured petri nets. Url:
http://wiki.daimi.au.dk/cpntools/cpntools.wiki.
DARTE, A.; ROBERT, Y.; VIVIEN, F. Scheduling and Automatic Parallelization.
[S.l.]: Birkhauser Boston, 2000.
DORIGO, M.; STüTZLE, T. Ant Colony Optimization. Cambridge, Massachussets:
The MIT Press, 2004.
121
ENGELBRECHT, A. P. Computational Intelligence: an introduction. NJ: Wiley Pub-
lishing, 2007.
FUKUZAWA, K.; SAEKI, M. Evaluating software architectures by coloured petri nets.
In: SEKE ’02: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON
SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2002, New York,
NY, USA. Anais... ACM, 2002. p.263–270.
HE, X.; LEE, J. A. N. A methodology for constructing predicate transition net specifica-
tions. Softw. Pract. Exper., New York, NY, USA, v.21, n.8, p.845–875, 1991.
HEGAZ, T.; KASSAB, M. Resource Optimization Using Combined Simulation and Ge-
netic Algorithms. In: JOURNAL OF CONSTRUCTION ENGINEERING AND MAN-
AGEMENT, 2003. Anais... ASCE, 2003. p.698–705.
HOLLAND, J. H. Adaptation in natural and artificial systems. Ann Arbor, USA: Uni-
versity of Michigan Press, 1975.
HOLLAND, J. H. Adaptation in natural and artificial systems. Cambridge, MA, USA:
MIT Press, 1992.
JENSEN, K. Coloured Petri Nets: a high level language for system design and analysis.
In: APN 90: PROCEEDINGS ON ADVANCES IN PETRI NETS 1990, 1991, New York,
NY, USA. Anais... Springer-Verlag New York: Inc., 1991. p.342–416.
JENSEN, K. Coloured Petri Nets: basic concepts, analysis methods and practical use
(monographs in theoretical computer science a series of eatcs). [S.l.]: Springer, 1997.
JENSEN, K.; ROZENBERG, G. (Ed.). High-level Petri nets: theory and application.
London, UK: Springer-Verlag, 1991.
JORDAN, H. F.; ALAGHBAND, G. Fundamentals of Parallel Processing. [S.l.]: Pren-
tice Hall Professional Technical Reference, 2003.
KANG, Q.; HE, H.; WANG, H.; JIANG, C. A Novel Discrete Particle Swarm Optimiza-
tion Algorithm for Job Scheduling in Grids. In: ICNC ’08: PROCEEDINGS OF THE
2008 FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION,
2008, Washington, DC, USA. Anais... IEEE Computer Society, 2008. p.401–405.
122
KASHAN, A. H.; KARIMI, B.; JENABI, M. A hybrid genetic heuristic for scheduling
parallel batch processing machines with arbitrary job sizes. Comput. Oper. Res., Oxford,
UK, UK, v.35, n.4, p.1084–1098, 2008.
KIM, J.-L. Permutation-based elitist genetic algorithm using serial scheme for large-sized
resource-constrained project scheduling. In: WSC ’07: PROCEEDINGS OF THE 39TH
CONFERENCE ON WINTER SIMULATION, 2007, Piscataway, NJ, USA. Anais...
IEEE Press, 2007. p.2112–2118.
KOLEN, A. W. J.; KAN, A. H. G. R.; HOESEL, C. P. M. van; WAGELMANS, A. P. M.
Sensitivity analysis of list scheduling heuristics. Discrete Appl. Math., Amsterdam, The
Netherlands, The Netherlands, v.55, n.2, p.145–162, 1994.
KWOK, Y. K.; AHMAD, I. Benchmarking the Task Graph Scheduling Algorithms. In:
IPPS ’98: PROCEEDINGS OF THE 12TH. INTERNATIONAL PARALLEL PROCESS-
ING SYMPOSIUM ON INTERNATIONALPARALLEL PROCESSING SYMPOSIUM,
1998, Washington, DC, USA. Anais... IEEE Computer Society, 1998. p.531.
LENCASTRE, M. Conceptualization of an Environment for the Development of
FEM Simulators. 2004. Thesis in Computer Science Federal University of Pernam-
buco, Recife, Pernambuco.
LEUNG, J.; KELLY, L.; ANDERSON, J.H. Handbook ofScheduling: algorithms, mod-
els, and performance analysis. Boca Raton, FL, USA: CRC Press, Inc., 2004.
LOBO, F. G.; LIMA, C. F. A review of adaptive population sizing schemes in genetic
algorithms. In: GECCO ’05: PROCEEDINGS OF THE 2005 WORKSHOPS ON GE-
NETIC AND EVOLUTIONARY COMPUTATION, 2005, New York, NY, USA. Anais...
ACM, 2005. p.228–234.
LOGAN, D. L. A First Course in the Finite Element Method. 3rd.ed. Pacific Grove,
CA, USA: Brooks/Cole Publishing Co., 2002.
MAN, K. F.; TANG, K. S.; KWONG, S. Genetic Algorithms: concepts and designs with
disk. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 1999.
123
MARSAN, M. A.; BALBO, G.; CONTE, G.; DONATELLI, S.; FRANCESCHINIS, G.
Modelling with Generalized Stochastic Petri Nets. SIGMETRICS Perform. Eval. Rev.,
New York, NY, USA, v.26, n.2, p.2, 1998.
MICHELI, G. D. High-Level Synthesis of Digital Circuits. Advances in Computers,
[S.l.], v.37, p.207–283, 1993.
MOATTAR, E.; RAHMANI, A.; DERAKHSHI, M. Job Scheduling in Multi Processor
Architecture Using Genetic Algorithm. Innovations in Information Technology, 2007.
Innovations ’07. 4th International Conference on, [S.l.], p.248–251, Nov. 2007.
MONTANA, D.; BRINN, M.; MOORE, S.; BIDWELL, G. Genetic algorithms for com-
plex, real-time scheduling. In: SYSTEMS, MAN, AND CYBERNETICS, 1998. 1998
IEEE INTERNATIONAL CONFERENCE ON, 1998. Anais... [S.l.: s.n.], 1998. v.3,
p.2213–2218 vol.3.
MURATA, T. Petri nets: properties, analysis and applications. Proceedings of the IEEE,
[S.l.], v.77, n.4, p.541–580, Apr 1989.
PETRI, C. A. Kommunikation mit Automaten. 1962. Tese (Doutorado em Ciência da
Computação) Bonn: Institut für Instrumentelle Mathematik, Schriften des IIM Nr. 2.
Second Edition:, New York: Griss Air Force Base, Technical Report RADC-TR-65–
377, Vol.1, 1966, Pages: Suppl. 1, English translation.
PINEDO, M. L. Scheduling: theory, algorithms, and systems. [S.l.]: Springer Publishing
Company, Incorporated, 2008.
RAMASWAMY, S.; SAPATNEKAR, S.; BANERJEE, P. A Framework for Exploiting
Task and Data Parallelism on Distributed Memory Multicomputers. IEEE Trans. Paral-
lel Distrib. Syst., Piscataway, NJ, USA, v.8, n.11, p.1098–1116, 1997.
REEVES, C.; YAMADA, T. Genetic Algorithms, Path Relinking and the Flowshop Se-
quencing Problem. Evolutionary Computation, [S.l.], v.6, p.45–60, 1998.
REISIG, W. Petri nets: anintroduction.NewYork, NY,USA: Springer-Verlag NewYork,
Inc., 1985.
REZENDE, S. O. Sistemas Inteligentes: fundamentos e aplicações. [S.l.]: Manole, 2002.
124
ROUBTSOVA, E. Property specification for coloured Petri nets. In: SYSTEMS, MAN
AND CYBERNETICS, 2004 IEEE INTERNATIONAL CONFERENCE ON, 2004.
Anais... [S.l.: s.n.], 2004. v.3, p.2617–2622 vol.3.
SANTOS, F. C. G.; BARBOSA, J. M. A.; BEZERRA, J. M.; BRITO, E. R. R. J. An
Architecture for the Automatic Development of High Performance Multi-Physics Sim-
ulators. Fourth International Conference on Advanced Computational Methods in
Engineering (ACOMEN 2008), Liège, Belgium, 2008.
SANTOS, F. C. G.; BRITO, E. R. R. J.; BARBOSA, J. M. A. Dealing with coupled
phenomena in the finite element method. XXVII Latin American Congress on Compu-
tational Methods in Engineering, Belém, Brasil, 2006.
SANTOS, F. C. G.; BRITO, E. R. R. J.; BARBOSA, J. M. A. Phenomenon computa-
tional pattern: coupling relationship between phenomena on multi-physics simulation.
Industrial Simulation Conference 2006, Palermo, Italy, 2006.
SANTOS, F. C. G.; BRITO, E. R. R. J.; BARBOSA, J. M. A. Phenomenon Computational
Coupling relationship between phenomena on multi-physics simulation. 20th European
Conference on Modeling and Simulation, Bonn, Germany, 2006.
SANTOS, F. C. G.; BRITO, E. R. R. J.; BARBOSA, J. M. A. Coping with data depen-
dence and sharing in the simulation of coupled phenomena. International Congress on
Computational and Applied Mathematics - ICCAM, Leuven, Belgium, 2006.
SANTOS, F. C. G.; VIEIRA, M.; LENCASTRE, M. Workflow for Simulators Based
on Finite Element Method. In: INTERNATIONAL CONFERENCE ON COMPUTA-
TIONAL SCIENCE, 2003. Anais... [S.l.: s.n.], 2003. v.2658/2003.
SIMION, B.; LEORDEANU, C.; POP, F.; CRISTEA, V. A Hybrid Algorithm for
Scheduling Workflow Applications in Grid Environments (ICPDP). In: OTM CONFER-
ENCES (2), 2007. Anais... [S.l.: s.n.], 2007. p.1331–1348.
SINNEN, O.; SOUSA, L. Comparison of Contention Aware List Scheduling Heuristics
for Cluster Computing. In: ICPPW ’01: PROCEEDINGS OF THE 2001 INTERNA-
TIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2001, Wash-
ington, DC, USA. Anais... IEEE Computer Society, 2001. p.382.
125
SYWERDA, G. Uniform crossoverin genetic algorithms. In: GENETIC ALGORITHMS,
1989, San Francisco, CA, USA. Proceedings... Morgan Kaufmann Publishers Inc.,
1989. p.2–9.
TSUTSUMI, R.; FUJIMOTO, Y. Sensitivity Analysis of Critical Path and Its Visualiza-
tion in Job Shop Scheduling. In: KNOWLEDGE AND SKILL CHAINS IN ENGINEER-
ING AND MANUFACTURING INFORMATION INFRASTRUCTURE IN THE ERA
OF GLOBAL COMMUNICATIONS, 2005. Anais... Springer Boston, 2005. p.313–320.
(IFIP International Federation for Information Processing, v.167/2005).
WANG, J. Timed Petri Nets: theory and applications. Norwell, MA: Kluwer Academic
Publisher, 1998. v.9.
WANG, L.; WU, H.; TANG, F.; ZHENG, D.-Z. A Hybrid Quantum-Inspired Genetic
Algorithm for Flow Shop Scheduling. In: ADVANCES IN INTELLIGENT COMPUT-
ING, 2005. Anais... Springer Berlin / Heidelberg, 2005. p.636–644. (Lecture Notes in
Computer Science, v.3645/2005).
XU, J.; SUN, H.; YANG, W. Heuristic Algorithm for Fixed Job Scheduling Problem.
In: ICNC ’07: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE
ON NATURAL COMPUTATION (ICNC 2007), 2007, Washington, DC, USA. Anais...
IEEE Computer Society, 2007. p.698–701.
YANG, P.; WONG, C.; MARCHAL, P.; CATTHOOR, F.; DESMET, D.; VERKEST, D.;
LAUWEREINS, R. Energy-Aware Runtime Scheduling for Embedded-Multiprocessor
SOCs. IEEE Des. Test, Los Alamitos, CA, USA, v.18, n.5, p.46–58, 2001.
ZALZALA, A. M.; FLEMING, P. J. (Ed.).Genetic Algorithmsin Engineering Systems.
Stevenage, UK, UK: Institution of Electrical Engineers, 1997.
Livros Grátis
( http://www.livrosgratis.com.br )
Milhares de Livros para Download:
Baixar livros de Administração
Baixar livros de Agronomia
Baixar livros de Arquitetura
Baixar livros de Artes
Baixar livros de Astronomia
Baixar livros de Biologia Geral
Baixar livros de Ciência da Computação
Baixar livros de Ciência da Informação
Baixar livros de Ciência Política
Baixar livros de Ciências da Saúde
Baixar livros de Comunicação
Baixar livros do Conselho Nacional de Educação - CNE
Baixar livros de Defesa civil
Baixar livros de Direito
Baixar livros de Direitos humanos
Baixar livros de Economia
Baixar livros de Economia Doméstica
Baixar livros de Educação
Baixar livros de Educação - Trânsito
Baixar livros de Educação Física
Baixar livros de Engenharia Aeroespacial
Baixar livros de Farmácia
Baixar livros de Filosofia
Baixar livros de Física
Baixar livros de Geociências
Baixar livros de Geografia
Baixar livros de História
Baixar livros de Línguas
Baixar livros de Literatura
Baixar livros de Literatura de Cordel
Baixar livros de Literatura Infantil
Baixar livros de Matemática
Baixar livros de Medicina
Baixar livros de Medicina Veterinária
Baixar livros de Meio Ambiente
Baixar livros de Meteorologia
Baixar Monografias e TCC
Baixar livros Multidisciplinar
Baixar livros de Música
Baixar livros de Psicologia
Baixar livros de Química
Baixar livros de Saúde Coletiva
Baixar livros de Serviço Social
Baixar livros de Sociologia
Baixar livros de Teologia
Baixar livros de Trabalho
Baixar livros de Turismo