( PDF ) Scheduling Parallel Jobs for Multiphysics Simulators

Download PDF

ads:

Universidade de Pernambuco

Escola Politécnica de Pernambuco

Departamento de Sistemas e Computação

Programa de Pós-Graduação em Engenharia da Computação

Renata Wanderley Medeiros

Scheduling Parallel Jobs for Multiphysics

Simulators

Dissertação de Mestrado

Recife, 28 de janeiro 2010

ads:

Livros Grátis

http://www.livrosgratis.com.br

Milhares de livros grátis para download.

UNIVERSIDADE DE PERNAMBUCO

DEPARTAMENTO DE SISTEMAS E COMPUTAÇÃO

PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DA COMPUTAÇÃO

RENATA WANDERLEY MEDEIROS

Scheduling Parallel Jobs for Multiphysics

Simulators

Thesis presented in partial fulﬁllment

of the requirements for the degree of

Master of Computer Science

Prof. Dr. Ricardo Massa Ferreira Lima

Advisor

Prof. Dr. Felix Christian Guimarães Santos

Coadvisor

Recife, 28 de janeiro 2010

ads:

CIP – CATALOGING-IN-PUBLICATION

Medeiros, Renata Wanderley

Scheduling Parallel Jobs for Multiphysics Simulators / Renata

Wanderley Medeiros. – Recife: PPGEC da UPE, 2010.

125 f.: il.

Thesis (Master) – Universidade de Pernambuco. Programa de

Pós-Graduação em Engenharia da Computação, Recife, BR–PE,

2010. Advisor: Ricardo Massa Ferreira Lima; Coadvisor: Felix

Christian Guimarães Santos.

1. Scheduling. 2. Genetic Algorithms. 3. Coloured Petri Nets.

4. MPhyScaS. I. Lima, Ricardo Massa Ferreira. II. Santos, Felix

Christian Guimarães. III. Título.

To my parents and aunt, who emphasized the importance

of education and help me with my lessons through their lives.

And to my husband, who has been proud of my work and

who has shared the many challenges for completing this dissertation.

Acknowledgment

It is a pleasure to thank those who made this dissertation possible such as my parents,

Olavio and Ione Medeiros, and my aunt, Orquidia, who always gave me aﬀection and

attention and also gave me the moral support to make me who I am.

I am heartily thankful to my husband, Danilo Carvalho, for the aid in the emotional

and intellectual portions of this dissertation.

To my sister, Roberta, and her family, Adriano and Luísa, a steady presence in my life

for the happiness and understanding always given to me.

I also would like to make a special reference to Professor Ricardo Massa that still

contributesto my academic life, providingknowledgeand guiding me through the studies.

In addition, a thank to Professors Felix Christian and Sérgio Soares who also introduce

me in MPhyScaS project and also enable me to develop an understanding of the subject.

To all other Professors of Department of Computing and Systems of Pernambuco

University, who instructed me the best during the masters course.

Lastly, I oﬀer my regards and blessings to my colleagues César Lins, and Fernando

Rocha, and to all of those who supported me in any respect during the completion of the

project.

Contents

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Algorithms Used in Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 The MPhyScaS problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 MPHYSCAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1 MPhyScaS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 MPhyScaS Parallel Architecture . . . . . . . . . . . . . . . . . . . . . . . 13

3 SCHEDULING ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 Classiﬁcation of Scheduling Problems . . . . . . . . . . . . . . . . . . . . 20

3.1.1 Machine Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.2 Job Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.3 Optimality Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Compared Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 List Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.2 Longest Processing Time - LPT . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.3 Shortest Processing Time - SPT . . . . . . . . . . . . . . . . . . . . . . . . 27

4 IDENTIFYING JOB DEPENDENCIES . . . . . . . . . . . . . . . . . . . 33

4.1 Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Coloured Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3 MPhyScaS - Petri Net Model . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 PROPOSED ALGORITHM . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1.1 Initial Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.2 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.3 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1.4 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1.5 Elitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.1.6 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2.1 Individual Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2.2 Selection and Elitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2.3 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2.4 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2.5 Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6 CASE STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.2 Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.2.1 System Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.3 Global States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.5 QuantityTasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7 EXPERIMENTS AND RESULTS . . . . . . . . . . . . . . . . . . . . . . . 76

7.1 The First Abstract Simulator . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.2 The Second Abstract Simulator . . . . . . . . . . . . . . . . . . . . . . . . 80

7.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

8 CONCLUSIONS AND FUTURE WORKS . . . . . . . . . . . . . . . . . . 86

8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

APPENDIX A CASE STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

A.1 Kernel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

A.2 System Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

A.3 Global States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

A.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

A.4.1 Algorithm 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

A.4.2 Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

A.4.3 Algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

A.4.4 Algorithm 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

A.4.5 Algorithm 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

A.4.6 Algorithm 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

A.4.7 Algorithm 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

A.4.8 Algorithm 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

A.4.9 Algorithm 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

A.5 QuantityTasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

A.5.1 QuantityTasks for Group 0 . . . . . . . . . . . . . . . . . . . . . . . . . . 107

A.5.2 QuantityTasks for Group 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 111

A.5.3 QuantityTasks for Group 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 117

A.5.4 QuantityTasks for Group 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 118

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

List of Figures

Figure 1.1: An overview of the whole process proposed. . . . . . . . . . . . . . 5

Figure 2.1: Computational representation for the layers of the simulator. . . . . . 9

Figure 2.2: Each Phenomenon is able of computing a set of quantities. . . . . . . 12

Figure 2.3: Each Phenomenon has its own set of States, which is stored in its Group. 12

Figure 2.4: A quantity computed by a Phenomenon may be coupled to a State

from other Phenomenon. . . . . . . . . . . . . . . . . . . . . . . . . 13

Figure 2.5: Hierarchy of the simulator in MPhyScaS-P. . . . . . . . . . . . . . . 17

Figure 2.6: Layers with procedures executed by ClusterRank in MPhyScaS-P. . . 17

Figure 2.7: Layers with procedures executed by MachineRank in MPhyScaS-P. . 18

Figure 2.8: Layers with procedures executed by ProcessRank in MPhyScaS-P. . . 18

Figure 3.1: Example of a job shop scheduling problem. . . . . . . . . . . . . . . 22

Figure 3.2: Example of a ﬂow shop scheduling problem. . . . . . . . . . . . . . 22

Figure 3.3: Example of a open shop scheduling problem. . . . . . . . . . . . . . 22

Figure 3.4: Example of LPT schedules. . . . . . . . . . . . . . . . . . . . . . . 31

Figure 3.5: Example of SPT schedules. . . . . . . . . . . . . . . . . . . . . . . . 32

Figure 4.1: An illustration of a transition (ﬁring) rule: (a) the marking before

ﬁring the enabled transition t; (b) the marking after ﬁring t, where t is

disabled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Figure 4.2: Graphical representation of Petri nets PN

and PN

. . . . . . . . . . 42

Figure 4.3: Graphical representation of Petri nets PN

and PN

. . . . . . . . . . 43

Figure 4.4: Graphical representation of Petri nets PN

, PN

and PN

. . . . . . . . 45

Figure 4.5: Graphical representation of Petri nets PN

and PN

. . . . . . . . . . 46

Figure 4.6: Graphical representation of Petri nets PN

, PN

and PN

. . . . . . 48

Figure 4.7: Graphical representation of Petri nets PN

and PN

. . . . . . . . . . 50

Figure 4.8: An example of a Petri net that represents the execution of two Group-

Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Figure 4.9: The Petri net with arcs direction inverted. . . . . . . . . . . . . . . . 52

Figure 4.10: The Petri net representing the execution of the ﬁrst executed task. . . 53

Figure 4.11: The Petri net representing the execution of the second executed task. . 54

Figure 5.1: Fitness-proportionate selection example. . . . . . . . . . . . . . . . . 59

Figure 5.2: Tournament selection example. . . . . . . . . . . . . . . . . . . . . . 60

Figure 5.3: Single-point crossover example. . . . . . . . . . . . . . . . . . . . . 61

Figure 5.4: Multipoint crossover example. . . . . . . . . . . . . . . . . . . . . . 61

Figure 5.5: Uniform crossover example. . . . . . . . . . . . . . . . . . . . . . . 62

Figure 5.6: Mutation example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Figure 5.7: An example of a scheduling individual. . . . . . . . . . . . . . . . . 65

Figure 5.8: First step of crossover individuals. . . . . . . . . . . . . . . . . . . . 66

Figure 5.9: Second step of crossover individuals. . . . . . . . . . . . . . . . . . 66

Figure 5.10: First step of crossover individuals that generates unfeasible oﬀspring. 67

Figure 5.11: Second step of crossoverindividuals that generates unfeasible oﬀspring. 67

Figure 5.12: An example of the proposed mutation. . . . . . . . . . . . . . . . . . 68

Figure 5.13: An example of the proposed individual representation. . . . . . . . . 69

Figure 7.1: The convergence of the proposed genetic algorithm for 50 and 100

individuals using ﬁrst architecture. . . . . . . . . . . . . . . . . . . . 79

Figure 7.2: The convergence of the proposed genetic algorithm for 50 and 100

individuals using second architecture. . . . . . . . . . . . . . . . . . 80

Figure 7.3: The convergence of the proposed genetic algorithm for 50 and 100

individuals using the ﬁrst architecture. . . . . . . . . . . . . . . . . . 81

Figure 7.4: The convergence of the proposed genetic algorithm for 50 and 100

individuals using the second architecture. . . . . . . . . . . . . . . . 83

Figure 7.5: The convergence of the proposed genetic algorithm for 50 and 100

individuals using the ﬁrst architecture. . . . . . . . . . . . . . . . . . 85

Figure 7.6: The convergence of the proposed genetic algorithm for 50 and 100

individuals using the second architecture. . . . . . . . . . . . . . . . 85

Abstract

Simulações de problemas reais envolvendo fenômenos físicos podem exigir muito tempo

para serem executados. Para melhorar o desempenho dessas simulações é necessário ter

uma abordagem de paralelização dos processos que compõem a simulação.

MPhyScaS (Multi-Physics and Multi-Scale Solver Environment) é um ambiente de-

dicado ao desenvolvimento automático de simuladores. Cada simulação do MPhyScaS

exige muito tempo de execução. Para paralelizar as simulações do MPhyScaS a abor-

dagem usada devefazer uso do conceito decamadas já deﬁnido na arquiteturado MPhyScaS

com o intuito de deﬁnir uma estrutura de paralelização hierárquica.

O objetivo deste trabalho é melhorar o desempenho das simulações do MPhyScaS,

que são compostas por um conjunto de tarefas dependentes umas das outras, escalonando

essas tarefas. O modelo apresentado é baseado em Algoritmos Genéticos (GA – Genetic

Algorithms) para escalonar tarefas paralelas de acordo com as restrições de dependências

já impostas pela arquitetura do MPhyScaS. A comunicação entre os processadores tam-

bém deve ser considerada neste processo. Portanto, deve-se achar um equilíbrio entre a

execução dos processos e o tempo necessário para que esses processos se comuniquem.

A técnica de GA geralmente ésimples, mas para ser aplicadaem problemas de escalon-

amento deve ter informações a respeito das dependências entre processos. Para obter es-

sas informações outra técnica chamada Redes de Petri Coloridas (CPN – Coloured Petri

Nets) foi utilizada. Esta última técnica modela os dados de uma simulação do MPhyScaS

e analisa o modelo gerado identiﬁcando os processos paralelos e os dependentes.

Vários experimentos, incluindo um estudo de caso, foram realizados utilizando as

duas técnicas mencionadas. Esses experimentos produziram resultados satisfatórios, o

que demonstra a viabilidade da nossa contribuição e sua capacidade de ser aplicada aos

problemas do MPhyScaS.

Keywords: Scheduling, Genetic Algorithms, Coloured Petri Nets, MPhyScaS.

Abstract

Scheduling Parallel Jobs for Multiphysics Simulators

Real problem simulations involving physic phenomena can demand too much exe-

cution time.To improve the performance of these simulations it is necessary to have an

approach to parallelize the processes that compose the simulation.

MPhyScaS (Multi-Physics and Multi-Scale Solver Environment) is an environment

dedicated to the automatic development of simulators. Each MPhyScaS simulation de-

mands a great amount of time. To parallelize MPhyScaS simulations, the approach used

should make use of the concept of layers already used in MPhyScaS in order to deﬁne a

hierarchical parallel structure.

The aim of the work herein presented is to improve the performance of clusters in the

processing of MPhyScaS simulations which are composed by a set of dependent tasks

by scheduling these tasks. The presented model is based on Genetic Algorithms (GA) to

schedule theparallel tasks followingMPhyScaSarchitecture dependence restrictions. The

communication between processors must also be considered in this scheduling. There-

fore, a trade oﬀ must be found between the execution of processes and the time necessary

for these processes to communicate with each other.

GA are generally simple, but to be applied in scheduling problems they have to know

about the dependences between the processes. In order to obtain this information, we

use another technique called Coloured Petri Nets (CPN). The latter technique models

MPhyScaS simulation data and analyzes the model identifying the parallel processes and

the dependent ones.

Several experiments, including a case study, were carried out making use of both

techniques. These experimentsproduced satisfactory results that demonstratedhowuseful

our contribution is and its capacity to be applied in MPhyScaS.

Keywords: Scheduling, Genetic Algorithms, Coloured Petri Nets, MPhyScaS.

Chapter 1

Introduction

Scheduling is a decision-making process that is used on a regular basis in many systems.

It deals with the allocation of resources to tasks over given time periods and its goal is

to optimize one or more objectives. Scheduling, as a decision-making process, plays

an important role in most systems as well as in most information processing environ-

ments (PINEDO, 2008). For the type of system herein presented (computing system), the

resources are the processors available for executing, and the tasks are the processes that

must be executed. A process is a part of a running software program or other computing

operation that does a single task.

The ﬁrst schedulingalgorithms were formulated in the mid ﬁfties. Since then there has

been a growing interest in scheduling. During the seventies, computer scientists discov-

ered scheduling as a tool forimprovingthe performanceof computer systems(BRUCKER,

2006). Furthermore, scheduling problems have been investigated and classiﬁed with re-

spect to their computational complexity. During the last few years, new and interesting

scheduling problems have been formulated in connection with ﬂexible manufacturing.

While sequential formulation of a problem can lead to a solution, a tremendous perfor-

mance advantage is available from doing many operations in parallel. The two principal

approaches to speed up a computation are a faster clock rate for the underlying hard-

ware and doing more operations in parallel. Introducing parallel operations to speed up

an application is a promising approach, because as tasks become larger, more operations

can potentially be done in parallel. To explore this potential, three things must work

together: i) algorithms must involve many independent operations; ii) programming lan-

guages must allow the speciﬁcation of parallel operations or identify them automatically;

iii) the architecture of the computer running the program must execute multiple operations

simultaneously (JORDAN; ALAGHBAND, 2003).

An eﬀective way to extract the most performance and scalability out of the parallel

application is to exploit both task parallelism and data parallelism (AIDA; CASANOVA,

2008). With this approach, which is typically termed asmixed parallelism (RAMASWAMY;

SAPATNEKAR; BANERJEE, 1997), an application consists of tasks organized in a Di-

rected Acyclic Graph (DAG) in which each edge corresponds to a precedence relation

between two tasks, implying possible data communication. The common approach to

schedule DAG’s is the task parallel paradigm, which assigns one task to one processor.

The scheduling consists on the distribution of the DAG nodes among the machine nodes,

so that the makespan is minimum.

Early scheduling algorithms do not take communication into account, but due to the

increasing gap betweencomputationand communication performance of parallel systems,

the consideration of the communication became more important (SINNEN; SOUSA,

2001) and is included in the scheduling algorithms recently proposed (KWOK; AHMAD,

1998).

1.1 Algorithms Used in Scheduling

The problem of optimally assignment of jobs onto the machines in a grid has been shown,

in general, to be NP-complete. Since an exhaustivesearch is often impractical, most of the

work has been done on fast heuristic methods to ﬁnd suboptimal solutions (KANG et al.,

2008). The most studied heuristic methods for scheduling problems are List Schedul-

ing (SIMION et al., 2007), Longest Processing Time (LPT) (BLAZEWICZ et al., 2007),

Shortest Processing Time (SPT) (LEUNG; KELLY; ANDERSON, 2004), etc. These

heuristics are of the constructive type. They start without a schedule and gradually con-

struct a schedule by adding one job at a time.

To achieve a better solution quality, modern meta-heuristics have been presented for

the scheduling problem such as Genetic Algorithms (MOATTAR; RAHMANI; DER-

AKHSHI, 2007), Simulated Annealing (XU; SUN; YANG, 2007), Tabu Search (CHEN;

LIN, 2000). These algorithms are of the improvement type, which are conceptually com-

pletely diﬀerent from algorithms of the constructive type. They start out with a complete

schedule, which may be selected arbitrarily, and then try to obtain a better schedule by

manipulating the current schedule.

Among these methods, GA (Genetic Algorithms), a meta-heuristic and optimization

technique, has emerged as a tool that is beneﬁcial for a variety of study ﬁelds including

construction applications since the introduction in the 1960’s by Holland (HOLLAND,

1992). Several studies have been done to solve the scheduling problem using GA (KIM,

2007). GA has also been used successfully to solve construction management problems,

including resources scheduling with a small number of activities (HEGAZ; KASSAB,

2003).

The reason for GA success at a wide and ever growing range of scheduling problems

is a combination of power and ﬂexibility (MONTANA et al., 1998). The power derives

from the empirically proven ability of evolutionary algorithms to eﬃciently ﬁnd glob-

ally competitive optima in large and complex search spaces. The favorable scaling of

evolutionary algorithms as a function of the search spaces makes them particularly eﬀec-

tive in comparison with other search algorithms for the large spaces typical of real-world

scheduling (MONTANA et al., 1998).

The ﬂexibility of Genetic Algorithms has multiple facets. Even the standard Genetic

Algorithm (i.e, bit string representation with traditional crossover and mutation opera-

tors) can eﬀectively handle problems that many traditional optimization algorithms can-

not. Some examples of these problems are: discrete spaces; nonlinear, discontinuous

evaluation functions; nonlinear, discontinuous constraints (BANZHAF et al., 1998). The

use of non-standard Genetic Algorithms and the tailoring of representation, operators,

initialization method, etc. to ﬁt the problem greatly increases the range of problems to

which GA can be eﬀectively applied.

1.2 The MPhyScaS problem

MPhyScaS (Multi-Physics and Multi-Scale Solver Environment) is an environment ded-

icated to the automatic development of simulators based on the Finite Element Method

(FEM (LOGAN, 2002)). The term multi-physics is a qualiﬁer to a set of phenomena that

interact in time and space. A multi-physics system can also be called a system of coupled

phenomena. These phenomena are of diﬀerent natures and behavior scale.

Simulators based on the Finite Element Method can be organized in a layers architec-

ture (SANTOS; BRITO; BARBOSA, 2006a). The deﬁnition of the layers is important to

the system modularity. But this does not indicate either how entities belonging to diﬀerent

layers interact or what data they share or depend upon. This is certainly very important

for the deﬁnition of abstractions that can standardize how layers interact and behave. The

deﬁnition of MPhyScaS architecture is aimed at improving the quality of simulator de-

signs. The deﬁned architecture attempts to ﬁll in the existing gap in the development of

FEM simulators for multi-physics and multi-scales problems.

MPhyScaS simulates real problems involving a set of diﬀerent physic phenomena.

Each MPhyScaS simulationdemands agreat amount of time. To improvethe performance

of MPhyScaS simulation it is necessary to have an approach to parallelize the processes

that compose the simulation. This approach should deﬁne the distribution of processes,

and their relationship across a cluster of PC’s. It also should make use of the concept

of layers, which is already used in MPhyScaS, in order to deﬁne a hierarchical parallel

structure.

1.3 Methodology

Scheduling techniquesneed toknowthe dependencies betweenprocesses. This makes im-

portant to obtain this information automatically. For this purpose, we studied MPhyScaS

data in order to identify this dependencies, which can be data dependencies or dependen-

cies inherited from the MPhyScaS architecture.

In order to analyze MPhyScaS data, we developed a Petri net model which represents

the elements and their relationship. We adopt a Coloured Petri Net (CPN), in which the

token stores the task being executed and its execution timestamps. This model permits

us to analyze all levels of the architecture. Figure 1.1 shows that we also developed a

compiler to create a CPN according to MPhyScaS data. In the same ﬁgure, one can note

that we use the CPNTools to analyze the CPN model and provide the dependencies as

the results. The analysis done in CPNTools has the aim of ﬁnding parallel processes in

each architecture level and casual dependencies between the processes. These restrictions

can inform, for example, that a process can only be executed after another one or that a

process needs some information calculated from another one.

MPhyScaS

Compiler

CPNTools

data

CPN

dependencies

architecture

schedule

Figure 1.1: An overview of the whole process proposed.

Each object of MPhyScaS architecture represents one non-preemptive process. In

MPhyScaS, the computation time of each process and the communication cost between

processes due to the size of required data are known. Thus, we use an oﬀ line scheduling

approach. This approach groups processes that communicate large data with each other

in the same processor. This decrease the data volume through the network, decreasing the

communication cost of a simulation. In our approach we also consider a ﬁnite number of

processors.

We deﬁned metrics to evaluate the scheduling algorithm proposed in this work. Such

metrics simultaneously evaluate three optimization criteria: i) the total execution time

(makespan); ii) the total idle time of each processor (waiting time); iii) the time spent

with communication between processes.

The complexity of the problem relies on the data dependence because the size of

some required data might be very large, increasing communication costs. Moreover, the

algorithm also takes into account the architecture of computers where the simulations

will run. Figure 1.1 shows that the algorithm chosen to schedule processes (Genetic

Algorithms - GA) uses the dependencies found in CPN analysis and also considers the

architecture of computers to ﬁnd a schedule for the simulation.

1.4 Related Work

et al.

(XU; SUN; YANG, 2007) used Simulated Annealing (SA) for ﬁxed job schedul-

ing problem. Their objective was to minimize assignment cost of the sequential network

model. In (KANG et al., 2008), Kang

et al.

proposed a discrete variant of Particle Swarm

Optimization (PSO) to solve job scheduling problems which all tasks are non-preemptive

and independent of each other. Thus, there is no communication cost to be worried about.

Chong

et al.

(CHONG et al., 2006) relied on nectar collection of honey bee colonies to

create an algorithm for solving scheduling problem. They consider only the makespan

function to compare its algorithm. The algorithms cited above are of the improvement

type. They start out with a complete schedule, which may be selected arbitrarily, and then

try to obtain a better schedule by manipulating the current schedule.

Among these methods, the Genetic Algorithm (GA) has emerged as a tool that is ben-

eﬁcial for a variety of study ﬁelds. Several studies have been done to solve the scheduling

problem using GA. Kim (KIM, 2007) proposed a permutation-based elitist genetic al-

gorithm that used serial schedule generation scheme for solving a large-sized multiple

resource-constrained project scheduling problem. Kim consider the number of resources,

but do not consider waiting time and communication costs. Moattar

et al.

(MOATTAR;

RAHMANI; DERAKHSHI, 2007) proposed a GA based algorithm that ﬁnds schedules

where jobs are partitioned between processors in which total ﬁnishing time and waiting

time are minimized. As we did, they used a ﬁtness function based on aggregation to op-

timize two criteria simultaneously, but they did not aggregate the communication cost to

their function. Yang

et al.

(YANG et al., 2001) also relies on GA for solving scheduling

problem, but they have a diﬀ erent optimization criteria: close the gap between the spec-

iﬁcation of concurrent, communicating processes and the heterogeneous processor target

without compromising required real-time performance and cost-eﬀectiveness.

1.5 Organization

This dissertation is organized as follows. Chapter 2 contains the MPhyScaS deﬁnition

and its properties. It also presents the MPhyScaS architecture and the explanation of

each of its levels. In Chapter 3 the scheduling algorithm concepts are introduced and

some scheduling algorithms are presented. Chapter 4 introduces the concepts of Petri

nets, shows an extended Petri net called Coloured Petri Nets (CPN) and presents a model

deﬁnition based on CPN to represent MPhyScaS data which will help to identify depen-

dencies between the MPhyScaS architecture objects. Chapter 5 introduces the concepts of

Genetic Algorithms (GA) and presents a new GA approach that consider some new eﬀects

in scheduling process, worried in solving MPhyScaS scheduling problem. In Chapter 6

an MPhyScaS case study is detailed which will be part of the experiments. Chapter 7

shows the experiments and the results obtained in the application of the proposed tech-

nique compared to others. Chapter 8 concludes this work and presents some possible

plans to the future.

Chapter 2

MPhyScaS

MPhyScaS (Multi-Physics and Multi-Scale Solver Environment) is an environment ded-

icated to the automatic development of simulators based on the Finite Element Method

(FEM (LOGAN, 2002)). The term multi-physics is a qualiﬁer to a set of phenomena that

interact in time and space. A multi-physics system can also be called a system of coupled

phenomena. These phenomena are of diﬀerent natures and behavior scale.

If two phenomena are coupled, it means that a part of one phenomenon’s data depends

on information from other phenomenon. Such a dependence may occur in any geometric

part, where both phenomena are deﬁned. Other type of data dependence is the case where

two or more phenomena are deﬁned on the same geometric component and share the

geometric mesh.

Usually, simulators based on the Finite Element Method can be organized in a lay-

ers architecture (SANTOS; BRITO; BARBOSA, 2006a). In the top layer global iterative

loops (for time stepping, model adaptation and articulation of several blocks of solution

algorithms) can be found. This corresponds to the overall scenery of the simulation. The

second layer contains what is called the solution algorithms. Each solution algorithm dic-

tates the way linear systems are built and solved. It also deﬁnes the type of all operations

involving matrices, vectors and scalars, and the moment when they have to be performed.

The third layer contains the solvers for linear systems and all the machinery for operat-

ing with matrices and vectors. This layer is the place where all global matrices, vectors

and scalars are located. The last layer is the phenomenon layer, which is responsible for

computing local matrices and vectors at the ﬁnite element level and assembling them into

global data structures.

The layers deﬁnition is important to the system modularity, but this does not indicate

neither how entities belonging to diﬀerent layers interact nor what data they share or

depend upon. This is certainly very important for the deﬁnition of abstractions that can

standardize how layers interact and behave.

MPhyScaS provides a patterns language to deﬁne and represent not only the set of

entities of each layer but also the data transfer and the services between layers. Thus,

MPhyScaS can be considered as a framework that combines a number of computational

entities, which are deﬁned based on the patterns language (SANTOS; BRITO; BAR-

BOSA, 2006b), forming a simulator. This simulator is composed of a set of reusable com-

ponents which can be changed in order to reconﬁgure the simulator (SANTOS; BRITO;

BARBOSA, 2006c,d), changing solution methods or other types of behavior(SANTOS;

VIEIRA; LENCASTRE, 2003). This means that MPhyScaS simulators are ﬂexible,

adaptable and maintainable.

2.1 MPhyScaS Architecture

The MPhyScaS architecture is proposed in (LENCASTRE, 2004). This architecture es-

tablishes a computational representation for the computational layers using patterns (see

Figure 2.1). The Kernel level represents the global scenery level, the level of the solution

algorithms is represented by the Block level, the level of solvers is represented by the

Group level, and the phenomena level is represented by the Phenomenon level.

Figure 2.1: Computational representation for the layers of the simulator.

The deﬁnition of this structure is aimed at improving the quality of simulator de-

signs. The deﬁned architecture attempts to ﬁll in the existing gap in the development of

FEM simulators for multi-physics and multi-scales problems. The main requirements of

MPhyScaS architecture are:

• Flexibility in the development of simulators;

• Extensibility of simulators through the integration of components;

• Improved reusability of processes, data and models.

MPhyScaS architecture is organized as:

• The ﬁrst layer: is the Kernel and it represents the overall scenery of the simulation.

It is responsible for initialization of procedures (transfered to Blocks in the lower

level), for global time loops and iterations, for global adaptive iterations, and for

articulations ofactivitiesto be executed by the Blocks in the lower level. The Kernel

stores system data related to the parameters for its loops and iterations. Therefore,

in a simulation only one Kernel can exists.

• The second layer: is the Block and it represents the solution algorithms, in other

words, the way linear systems are built and solved. It is responsible for the transfer

of incoming demands from the Kernel to its Groups in the lower level (initialization

procedures, for instance); for Block local time loops and iterations (inner loops

and iterations inside a global time step, restricted to groups of Phenomena); for

procedures inside time stepping schemes; for Block local iterations (restricted to

some groups of Phenomena); for Block local adaptive iterations (restricted to some

groups of Phenomena); for operations with global quantities (transfered to Groups

in the lower level, which are the owners of global quantities). The Blocks serve the

Kernel level. Each Block is responsible for a certain number of Groups, which can

not be owned by other Block. All demands for a Block to the lower level should

be addressed to its Groups. The Blocks store system data related to parameters for

their own loops and iterations and parameters for their procedures.

• The third layer: is the Group and it represents the solvers of linear systems. It

is responsible for the transfer of incoming demands from its Block to Phenomena

in the lower level (initialization procedures, for instance); for the assembling co-

ordination and solution of systems of linear algebraic equations (the method used

depends on the solver component); for operations with global quantities (by de-

mand from its Block); for articulation of activities to be executed by its Phenomena

in the lower level (basically concerned with computation and assembling of global

matrices and vectors). The Groups serve their respective Blocks. Each Group is

responsible for a certain number of Phenomena, which can not be owned by other

Group. All demands from a Group to the lower level should be addressed to its

Phenomena only. The Groups store global matrices, vectors and scalars and store

the GroupTasks, which are objects encapsulating standard procedures, where artic-

ulation of the Group’s Phenomena are needed. The GroupTasks are programmable

and their data are standard pieces of information, depending only on the type of the

GroupTask.

• The last layer: is the Phenomenon. It is responsible for the computation of lo-

cal matrices, vectors and scalars (Phenomenon quantities); for operations involv-

ing matrices and vectors at the ﬁnite element level and their assembling into given

global matrices and vectors. The Phenomena serve their respective Groups. The

Phenomena store data related to constitutive parameters or other parameters, which

are speciﬁc of the respective Phenomenon; store the geometry where the respective

Phenomenon is deﬁned (diﬀerent Phenomena may share a geometry or a part of it);

store WeakForms, which are tools for computing and assembling quantities deﬁned

on a certain part of the geometry. A WeakForm may be active or not. Only active

WeakForms can be used during a simulation. A WeakForm may store parameters,

which are related to speciﬁc simulation data (for instance, functions for the deﬁ-

nition of the boundary conditions or parameters needed for the computation of a

quantity, which should be given together within a simulation data set). The Phe-

nomenon should store methods, which are tools to be used in certain Phenomenon

speciﬁc tasks. For instance, those tasks can be generation of geometric and Phe-

nomenon meshes, numerical integration at the element level, shape functions, etc.

The simulation starts with the execution of the root of the Kernel which uses services

provided by the Blocks. Each Block in turn uses services provided by the Groups. Each

Group owns a set of Phenomena which are used to perform the production of local struc-

tures (matrices, vectors, and scalars) and assembling them into a given (by the Group)

global one.

Figure 2.2 shows that the Kernel can articulate several Blocks; each Block can articu-

late several Groups; each Group can articulate several Phenomena; and each Phenomenon

is capable of computing and assembling a set of local Quantities (matrices, vectors and

scalars).

Figure 2.2: Each Phenomenon is able of computing a set of quantities.

The States that conﬁgure the Phenomenon are stored in the respective Group, where

solvers for linear systems are located. This is convenient, due to the fact that the Group is

responsible not only for assembling and solving algebraic systems, but also for operating

with the structures when requested by its Block (see Figure 2.3).

Figure 2.3: Each Phenomenon has its own set of States, which is stored in its Group.

A quantity that a Phenomenon can compute and assemble can be coupled with another

Phenomenon’s states (one or more) as it is depicted in Figure 2.4. MPhyScaS provides

all the machinery to make this procedure automatic, following the speciﬁcation of some

data related to the place where coupling occurs. These data should be given to the object

responsible for the computation.

Figure 2.4: A quantity computed by a Phenomenon may be coupled to a State from other

Phenomenon.

2.2 MPhyScaS Parallel Architecture

The original architecture of MPhyScaS provides support to the automatic building of

sequential simulators only. For instance, it does not have abstractions that could automat-

ically deﬁne the distributionof data and procedures and their relationships across a cluster

of PC’s. The MPhyScaS parallel architecture (called MPhyScaS-P) satisfy a number of

new requirements, including the support of parallel execution of the simulators in clusters

of PC’s.

MPhyScaS-S (sequential) is a framework with the support of an extended ﬁnite ele-

ment library and a knowledge management system. MPhyScaS-P uses the same extended

library and knowledge management system from MPhyScaS-S (with minor diﬀerences).

It also makes use of the concept of layers already used in MPhyScaS-S in order to deﬁne

a hierarchical parallel structure.

At least four hierarchical levels of diﬀ erent procedures and memory usage can be

found in modern clusters of PC’s (SANTOS et al., 2008):

• Cluster Level: it is composed of all processes running in all machines being used

in a simulation;

• Machine Level: it is composed of all processes running in one individual machine

among all those used in a simulation;

• Processor Level: it is composed of all processes running in one individual proces-

sor among all those running in one individual machine;

• Process Level: it is composed of one single process running in one individual

processor among all other processes in this same processor. It can be divided into

two groups:

– Core Sub-Level: it is composed of all parts of the code from one individual

process, which is not strongly hardware speciﬁc;

– Software-Hardware (SH) Sub-Level: it is composed of all parts of the code

from on individual process, which is strongly hardware speciﬁc (cache man-

agement, fpga acceleration).

Whenever the architecture of a computational system allows for a hierarchy of pro-

cedures, it may be a good idea to deﬁne a hierarchy of processes in such a way that few

of them would accumulate some very light management tasks. The beneﬁts for this strat-

egy include (SANTOS et al., 2008): (i) procedures can be hierarchically synchronized

(from coarse to ﬁne grain), reducing management concerns and increasing correctness;

(ii) since locality concerns change along the hierarchy levels, memory management can

become more and more specialized from top to bottom; (iii) the hierarchy allows for the

encapsulation of services and procedures, making it easier the exchange of components.

Besides the natural beneﬁt of this aspect, it also allows for the adaptation of the code to

new hardware and software technologies, without incurring in heavy reprogramming in

all levels of the hierarchy.

The MPhyScaS-S architecture has satisfactorily solved the main problems related to

data dependence and sharing between phenomena for the sequential processing. It is also

been able of representing solution algorithms in such a way that the entire simulator can

be reconﬁgured with a minimum amount of reprogramming (maximum amount of reuse).

However, an essential diﬀerence, when considering parallel processing, is the need for

interprocess communication, which can be of four types (SANTOS et al., 2008):

1. Communication during linear algebra operations: Interprocess communication

is needed during several linear algebra operations. For instance, parallel matrix-

vector multiplication requires interprocess communication in order to complement

the job done by each process;

2. Communication along process hierarchy: The establishment of the hierarchy of

procedures will require some processes to assume some kind of leadership, depend-

ing on the layer where they are located. This will require interprocess communica-

tion throughout the hierarchy;

3. Communication for coupled information - I: MPhyScaS-S has dealt with cou-

pling dependencies between diﬀerent phenomena already, but in parallel process-

ing this type of dependence can become very complex. For instance, this is the case

whenever the interface between two coupled phenomena coincides with a boundary

between two components of the mesh partition. That means that vector ﬁelds and

respective mesh data have to be transferred from one process to the other;

4. Communication for coupled information - II: If two coupled phenomena have

diﬀerent geometric meshes, all components in one mesh partition may be diﬀerent

from all components in the other partition. Thus, there will be a need for interpro-

cess data transfer from one process to the other, whenever coupled quantities have

to be computed.

Well-thought distribution schemes and data representation abstractions can eliminate

both types of communication for coupled information. This can be done by copying the

coupled vector ﬁelds and mesh data, to the processes where they are needed. Those copies

will be updated whenever needed (communication of type 1 only). Processes will have to

be given a larger memory space, but that can be made a minimum. The important thing is

that the whole data set of a coupled mesh will not be transferred between two processes

when a coupled quantity is to be computed. Thus, only types 1 and 2 will be needed.

They will be called communication of Type-I and Type-II, respectively.

In order to simplify the presentation of PMhyScaS-P’s architecture some requirements

have to be made (SANTOS et al., 2008): (i) if two or more phenomena are coupled in one

geometric entity, then they share the same geometric mesh on that geometric entity; (ii) all

phenomena should be represented in each process with a nonempty geometric mesh; (iii)

only three hierarchical levels will be considered: Cluster, Machine, and Process levels.

There are two views of the MPhyScaS-P’s architecture (SANTOS et al., 2008):

• Logical View: The logic of MPhyScaS-P’s workﬂow is the same of MPhyScaS-S’,

that is, it has the same levels of procedures (Kernel, Blocks, Groups and Phenom-

ena), all relationships between them are preserved and all procedures within each

layer are technically the same (besides the fact that data are nowdistributed). There-

fore the relationships among entities in the diﬀerent levels of MPhyScaS-P is also

a DAG (Directed Acyclic Graph). Thus, we are able of cloning a suitable modiﬁ-

cation of MPhyScaS-S to all processes in a SPMD (Single Process, Multiple Data)

scheme. In this sense, one can imagine that MPhyScaS-P is MPhyScaS-S with

distributed data and a hierarchical synchronization scheme (see Topological View

below);

• Topological View: The topology of the procedures in the workﬂow of MPhyScaS-

S is implemented in MPhyScaS-P in a hierarchical form with the aid of a set of

processes, which are responsible for the procedures synchronization. There are

three types of leader processes (see Figure 2.5):

– Cluster Rank Process: It is responsible for the execution of the Kernel and to

synchronize the beginning and the end of each one of its level’s tasks, which

requires demands to the lower level process. In a simulation there is only one

ClusterRank process (for instance, the process with rank equal to zero in an

MPI based system). Figure 2.6 depicts the relationship between a ClusterRank

process and the simulator layers;

– Machine Rank Process: One process is chosen among all processes running

in an individualmachine to be its leader. Thus, thereis only one MachineRank

process per machine. It is responsible for the execution of procedures in the

Block level and to synchronize the beginning and the end of each one of it’s

level’s tasks, which requires demands to lower level processes. ClusterRank is

also the MachineRank in its own machine. Figure 2.7 depicts the relationship

between a MachineRank process and the simulator layers;

– Process Rank Process: It is responsible for the execution of the procedures

in the Group level. The ClusterRank and all MachineRank processes are also

ProcessRank processes. Figure 2.8 depicts the relationship between a Pro-

cessRank process and the simulator layers.

ProcessRank

MachineRank

ClusterRank

Processor Processor Processor

Machine

Processor Processor

Machine

Cores

000

000 001 010 011 020 021 022

100

100 101 110

111

Figure 2.5: Hierarchy of the simulator in MPhyScaS-P.

Processor Processor Processor

Machine

Processor Processor

Machine

000

000 001 010 011 020 021 022

100

100 101 110

111

Process000executes

proceduresinalllayers

fromKerneldownwards

Phs

Groupk

Groupj

Kernel

BlockI

Solver

Phr Phm Phn

Figure 2.6: Layers with procedures executed by ClusterRank in MPhyScaS-P.

ProcessRank

MachineRank

Processor Processor

Machine

100

100 101 110

111

Process100executes

proceduresinalllayers

fromBlockdownwards

Groupk

Groupj

KernelAmbassador

BlockI

Phs

Solver

Phr

Solver

Phm Phn

Figure 2.7: Layers with procedures executed by MachineRank in MPhyScaS-P.

Processor Processor Processor

Machine

Processor Processor

Machine

000 001 010 011 020 021 022 100 101 110

111

Remainingprocessesexecute

proceduresinalllayersfrom

Groupdownwards.

Groupk

Groupj

BlockAmbassador

Phs

Solver

Phr

Solver

Phm Phn

ProcessRank

Figure 2.8: Layers with procedures executed by ProcessRank in MPhyScaS-P.

Knowing that MPhyScaS-S transfer commands from the Kernel level down to the

Phenomenon level in the form of a tree structure, it can be seen that a ClusterRank only

demands services from all MachineRanks, which only demands services from all of its

ProcessRanks. Since the activities in one level returns to the level immediately above

after they are ﬁnished (with the exception of the Kernel level) there are natural ways

of synchronizing each activity. Furthermore, there is certainly an advantage with the

tremendous simpliﬁcation in the synchronization tasks. Note that the activities in Group

and Phenomenon levels are left for a ﬁner granularity of management. In both levels there

are well localized CPU intensive operations, which could be accelerated with a suitable

software optimization and the use of hardware devices

Chapter 3

Scheduling Algorithms

Scheduling is a decision-marking process that is used on a regular basis in many manu-

facturing and services industries. It deals with the allocation of resources to tasks over a

given time periods and its goal is to optimize one or more objectives (PINEDO, 2008).

Deﬁnition 3.0.1 A task system is a weighted directed acyclic graph G = (V, E, ω) where:

• the ﬁnite set V of vertices represents the tasks;

• the set E of edges represents precedence constraints between tasks: e = (u, v) ∈ E

if and only if u ≺ v, meaning that task u preceeds task v;

• the weight function ω : V → N

∗

gives the weight (or duration) of each task. Task

weights are assumed to be positive integers.

A schedule σ of a task system is a function that assigns a beginning time and a target

processor to each task. A schedule must preserve the dependence constraints induced

by the precedence relation ≺ and materialized by the edges of the dependence graph; if

u ≺ v, then the execution of u begins at time σ(u) and requires ω(u) units of time, and

the beginning of the execution of v at time step σ(v) must be posterior to the end of the

execution of u. Assume the following formal deﬁnition (DARTE; ROBERT; VIVIEN,

2000).

Deﬁnition 3.0.2 A schedule of a task system G = (V, E, ω) is a function σ : V → N

∗

such

that σ(u) + ω(u) ≤ σ(v), whenever e = (u, v) ∈ E.

Often there are other constraints that must be met by schedules, called resource con-

straints. When there is only a ﬁxed number p of available processors, we say that we

have a problem with limited resources Pb(p). The resource constraints are expressed as

follows; each task v ∈ V is allocated to a processor alloc(v) using an allocation function

alloc : V → P, where P = 1, ..., p denotes the set of available processors. The relationship

between the schedule σ and the allocation alloc is that no processor can be allocated to

more than one task at the same time step. This translates into the following condition:

T  T

′

∧ alloc(T) = alloc(T

′

) ⇒











σ(T) + ω(T) ≤ σ(T

′

)

or σ(T

′

) + ω(T

′

) ≤ σ(T)

(3.1)

This condition express the fact that if two tasks T and T

′

are allocated to the same

processor, then the execution of T must be completed before the execution of T

′

can

begin, or vice versa.

3.1 Classiﬁcation of Scheduling Problems

Scheduling problems can be classiﬁed in terms of three ﬁelds (BRUCKER, 2006):

• α, that speciﬁes the machine environment;

• β, that speciﬁes the job characteristics;

• γ, that denotes the optimality criterion.

3.1.1 Machine Environment

The machine environment is characterized by a string with two parameters α

and α

represented as α = α

. The ﬁrst parameter α

can assume the following values:

∈ {◦, P, Q, R, PMPM, QMPM, G, J, F, O, X}. The second parameter α

should assume

a positive integer value to denote the number of machines available. If α

is an arbitrary

but ﬁxed number of machines, α

must be set to ◦.

If α

is set to ◦, this means that there are no parallel machines. With this parameter,

all jobs must be processed in a single machine.

If α

is set to {P, Q, R}, it represents an environment with parallel machines. For

identical parallel machines, α

must be set to P. In this case, the processing time of a job

is the same for all machines (see Deﬁnition 3.1.1).

Deﬁnition 3.1.1 Let J = { j

, j

, .. . , j

} be the set of jobs, M = {m

, m

, .. . , m

} be the set

of machines, ∀m

∈ M, ω(j

, m

) = p

Q represents the uniform parallel machines, that is, machines that run in diﬀerent but

related speeds. Thus, the processing time of a job depends on the machine speed, as

showed in Deﬁnition 3.1.2. Finally, if α

is set to R, then there are unrelated parallel

machines. In this case, we can not deﬁne the processing time of a job.

Deﬁnition 3.1.2 Let J = { j

, j

, .. . , j

} be the set of jobs, M = {m

, m

, .. . , m

} be the

set of machines, S = {s

, s

, .. . , s

} be the set of speeds of machines in M, ∀m

∈ M,

ω(j

, m

) =

When the machine environment contains multi-purpose machines, they can be speci-

ﬁed as PMPM or QMPM. In the ﬁrst case, there are multi-purpose machines with iden-

tical speeds. In the second case, there are multi-purpose machines with uniform speeds.

In the other cases (α

∈ {G, J, F, O, X}), we have a multi-operation model, i.e., as-

sociated with each job J

there is a set of operations O

, O

, .. . , O

. The machines are

dedicated. Furthermore, there are precedence relations between arbitrary operations. This

general model is called a general shop. We indicate the general shop by setting α

= G.

The symbols J, F, O, X designate special cases of general shop.

The job shop scheduling problem, represented by α

= J, is brieﬂy described as

follows (TSUTSUMI; FUJIMOTO, 2005). There is a set of jobs and a set of machines.

Each job consists of a sequence of operations which have precedence constraints of the

form O

→ O

i, j+1

and must be processed in order. Each of them needs ﬁxed duration

on one of the machines. Each machine can process at most one operation at one time,

each operation is not interrupted if once it has started. Figure 3.1 shows an example of

a job shop scheduling problem with three jobs (each of them has four operations) being

executed in four machines.

When α

= F, then it indicates that it is a ﬂow shop scheduling problem. In this

scheduling problem, there is a set of machines and each operation O

of a job i must be

Figure 3.1: Example of a job shop scheduling problem.

processed in the machine m

. Here, the operations of a job have the same precedence

constraints of the job shop scheduling problem. Figure 3.2 shows an example of a ﬂow

shop scheduling problem. In this case, the number of operations of each job should be

equal to the number of machines (WANG et al., 2005).

Figure 3.2: Example of a ﬂow shop scheduling problem.

The open shop scheduling problem denoted by O is similar to the ﬂow shop schedul-

ing problem. The main diﬀerence between them is that in the ﬂow shop problem the

operations for each job need to be processed in order, i.e., one may not begin processing

operation O

until operation O

i, j−1

has been completed (LEUNG; KELLY; ANDERSON,

2004). In the open shop problem the order in which operations are processed is irrelevant.

Figure 3.3 illustrates an open shop scheduling problem.

Figure 3.3: Example of a open shop scheduling problem.

The mixed shop scheduling problem is denoted by the symbol X. The mixed shop

problem is a combinationof thejob shopproblem and the openshop problem (BRUCKER,

2006). Thus, we have open shop jobs and job shop jobs. The number of operations of job

is denoted by n

3.1.2 Job Characteristics

The job characteristics are speciﬁed by a set β containing at the most six elements β

, β

, and β

(BRUCKER, 2006).

indicates whether preemption is allowed. Preemption is the act of temporarily

interrupting a job with the intention of resuming the job at a later time. In other words,

the preempted job is resumed and put to execution from the point of preemption without

loss of the prior work (BOBBIO, 1994). If preemption is allowed, we set β

= pmtn.

Otherwise, β

does not appear in β.

describes the precedence relations between jobs. A precedence relation is a binary

relation between jobs, that is, “Job J

precedes job J

” means that processing of job J

cannot be started until completion of job J

. A set of precedence relation can be repre-

sented by a Directed Acyclic Graph (DAG) G. If G is an arbitrary DAG, we set β

= prec.

Other ways to represent it can be given by chains, an intree, an outtree, a tree, or a series-

parallel directed graph. In these cases, we set β

equal to chains, intree, outtree, tree, and

sp-graph respectively. If there are no precedence relations, β

does not appear in β.

If β

= r

, then release dates may be speciﬁed for each job. The release date is the

time which before it the job cannot start. This means that each job starts after its release

date. If r

= 0 for all jobs, then β

does not appear in β.

speciﬁes restrictions on the processing times. If β

is equal to p

= 1 for all jobs,

then each job has a unit processing requirement, which means that each job requires only

a unit of time to be processed. Similarly, β

can be easily interpreted when set to p

= p,

which means that each job has processing time equals to p, or p

∈ {1, 2}, which means

that each job has processing time in the interval.

The deadline of a job is speciﬁed in β

. For each job J

must be speciﬁed a deadline d

The deadline represents the later time that a job must ﬁnish its execution. In other words,

the job must ﬁnish not later than time d

Some scheduling algorithms consider sets of jobs grouped into batches. A batch is a

set of jobs which must be processed jointly on a machine (KASHAN; KARIMI; JENABI,

2008). A batching problem is to group jobs into batches and schedule them. There are

two types of batching problems: p-batching problem, and s-batching problem. In the

p-batching problem type, the length of the batch is equal to the maximum of processing

times of all jobs of the batch. In the s-batching problem, the length of the batch is equal

to the sum of processing times of all jobs of the batch. If the scheduling problem consider

batches, then β

must be set to β

= p − batch or β

= s − batch. Otherwise, β

does not

appear in β.

3.1.3 Optimality Criterion

The optimization criteria speciﬁes what the problem needs to be optimized and how this

optimization should occur. It must be represented as a mathematical function involving

all properties of the problem in order to measure the quality of the solution (BARESEL;

STHAMER; SCHMIDT, 2002). This function should guide the process of searching the

best value to the problem (BARESEL et al., 2004), which is the optimal value of the

function. The optimization criteria can be the minimization or the maximization of this

function, called objective function.

In scheduling problems, there are some objective functions that are known as the most

common ones: makespan (max{C

|i = 1, ..., n}),total ﬂow time (



i=1

), and weighted total

ﬂow time (



i=1

) (BRUCKER, 2006), where C

is the ﬁnishing time of job J

. In this

cases we write γ = C

max

, γ =



, and γ =



, respectively.

3.2 Compared Scheduling Algorithms

In scheduling problems, n jobs with processing times p

, p

, .. . , p

have to be distributed

among m identical machines so as to minimize the time span of the resulting schedule.

If the completion time of the jth job is denoted by C

, then this criterion amounts to

the minimization of C

max

= max

j=1,...,n

}, also called makespan. Note that a solution

corresponds to an assignment of jobs to machines, and the exact order in which the jobs

are processed on the machines is important due to the dependencies between jobs. Fur-

thermore, we will consider identical machines, so we do not distinguish between them.

Therefore, an assignment of jobs to machines is actually a partition of the jobs into m

subsets.

Some scheduling algorithms are presented in this section: list scheduling; SPT (Short-

est Processing Time); and LPT (Longest Processing Time). They will be compared to the

algorithm presented in Chapter 5. This comparison will be done in terms of makespan,

the communication cost, and the waiting time of each processor. One can note that the

algorithms presented in this section only consider the ﬁrst criterion.

3.2.1 List Scheduling

One of the most often used general approximation strategies for solving scheduling prob-

lems is list scheduling (BLAZEWICZ et al., 2007). The basic idea of the list scheduling

is to make a scheduling list (a sequence of nodes for scheduling) by assigning them some

priorities. A node is stored in a priority ordered list as soon as it is ready. Nodes that have

all immediate predecessors executed are referred to as being ready.

In a traditional scheduling algorithm, the scheduling list is statically constructed be-

fore node allocation begins, and most importantly, the sequencing in the list is not mod-

iﬁed (SIMION et al., 2007). Depending on the way the priority list is constructed, the

schedules produced by such heuristic may be quite poor or quite good. Thus, this case

of heuristic provides a natural vehicle for the analysis of the relation between solution

quality and robustness (KOLEN et al., 1994).

When tasks are scheduled to resources, there may be some holes between schedule

tasks due to dependencies between tasks of the application. When a task is scheduled

to the ﬁrst available hole, this is called the insertion approach. If the task can only be

scheduled after the last task scheduled on the resource without considering holes, it is

called a non-insertion approach. The insertion approach performs much better than the

non-insertion one, because it utilizes idle times better (SIMION et al., 2007).

Then, the list scheduling algorithm executes the steps in the following pseudo-code.

The ﬁrst and second lines initialize a list of ready jobs and sort them into priority order.

In this stage, ready jobs are the ones that do not have any dependence even before the

beginning of the schedule algorithm. In lines 3 to 5, s

is initialized, where it represents

the sum of execution cost of processor p

. Then, the algorithm repeatedly assign the ﬁrst

job of ready jobs list to the ﬁrst free processor (lines 7 to 9) until all jobs are assigned to

any processor. In the same loop, the algorithm also update the s

value of the processor

that received the job. The s

value is added by the execution cost of the job (C

, line 10).

Before executing the next iteration of the loop, the algorithm update the ready jobs list by

removing the assigned jobs and adding the jobs that became ready at the correct position

in the list (sorted by the priority). At the end of the algorithm is possible to know the

makespan value which is the maximum value of s

(line 13).

Pseudo-Code 1: List scheduling algorithm pseudo-code.

Initialize Ready Jobs1

Sort Ready Jobs By Priority2

foreach processor p

k=1,...,m

do3

= 04

end5

while ready Jobs  ∅ do6

= Get First Job Of Ready Jobs7

= Get First Processor Free8

Assign Job To Processor( j

, p

= s

+ C

Update Ready Jobs11

end12

makespan = max{s

k=1,...,m

}13

As most heuristics, list scheduling assumes fully connected homogeneous processors

and ignores contention on the communication (SINNEN; SOUSA, 2001). Overall, list

scheduling algorithms have been widely used in synthesis systems (MICHELI, 1993).

Two well-known examples of permutation list scheduling rules are the SPT (Short-

est Processing Time) and the LPT (Longest Processing Time) rules. The quality of the

solution produced by these rules are very diﬀerent (KOLEN et al., 1994).

3.2.2 Longest Processing Time - LPT

One of the simplest algorithms is the LPT algorithm (BLAZEWICZ et al., 2007) in which

the tasks are arranged in order of decreasing processing times. The LPT rule tries to place

shorter jobs towards the end of the schedule, where they can be used to balance the load.

In case of identical, parallel machines, it has been shown that the makespan of the sched-

ule constructed with this rule is less than

of the optimal makespan (ALTENDORFER;

KABELKA; STOCHER, 2007). It is hence also used for maximizing machine utilization.

To demonstrate the number of possible assignments that can be generated by a given

LPT rule when uncertainty exists with respect to one of the processing times, we will

conduct an experiment carrying out a parametric analysis with respect to p

. This means

that we will study the LPT rule when p

= λ, where λ gradually increases from zero to

inﬁnity. Besides the number of solutions generated by the LPT rule, it is also natural to

study the solution value C

max

as a function of λ (KOLEN et al., 1994).

We will present an example with two machines and four jobs having processing times

= λ, p

= 1, p

= 2, and p

= 4, that are scheduled according to the LPT rule.

The diﬀerent schedules and corresponding linear parts of makespan (C

max

) are given in

Figure 3.4.

Let us now look at an arbitrary permutation list scheduling heuristic applied to an

arbitrary problem instance. When λ is increased from zero to inﬁnity the ordering of the

jobs according to non-decreasing processing times will change. Hence the list will change

too, although the relative order of the jobs 2, 3, ..., n will remain the same. If λ is equal

to one of the processing times p

, .. . , p

, then we may assume that in the order according

to non-decreasing processing time job 1 follows directly after the job with largest index

for which the processing time equals λ. Thus, a further increasing of λ leaves the current

ordering unchanged until λ becomes equal to the next larger processing time. It is easily

veriﬁed that C

max

is continuous in λ = p

(j = 2, ..., n) (KOLEN et al., 1994).

3.2.3 Shortest Processing Time - SPT

According to the SPT rule, jobs appear on the list in order of increasing processing times.

For n jobs with processing times p

= λ < p

≤ ... ≤ p

and j = 1, 2, ..., p, we may

assume that job that has processing time p

is scheduled in processor m

. Since m

is the

ﬁrst processor which becomes available again, the job that has processing time p

j+1

scheduled on m

. Because p

+ p

j+1

≥ p

, processor m

now is called as the maximum

processor and the job that has processing time p

j+2

is scheduled on m

. Since p

≥ p

and p

j+2

≥ p

j+1

processor m

has now become the maximum processor and job that

has processing time p

j+3

is scheduled on m

. Continuing this way we ﬁnd that an SPT-

schedule can always be assumed to have the following structure: job j (the one that has

processing time p

) is scheduled in processor m

where i is given by i = [(j−1) mod p]+1,

where j = 1, ..., n, and p is the number of available processors. The maximum processor

is the one processing job n.

The SPT rule has its strength with respect to utilization maximization especially if

some jobs take considerably longer than average. It avoids clogging a very busy ma-

chine with such a long-duration job by giving preference to jobs with shorter processing

times, reserving the long jobs for inﬂexible machines (ALTENDORFER; KABELKA;

STOCHER, 2007). When the objective is the sum of completion times, the minimum

value is achieved by sequencing the jobs in order of Shortest Processing Time. No mat-

ter what deadline is speciﬁed, SPT sequencing maximizes the number of jobs completed

by that time; choosing the small tasks obviously works better than choosing the large

tasks, if your objective is to complete as many as possible. It is not hard to see that

SPT is also optimal for the sum of waiting times (and the average of waiting times) as

well (BLAZEWICZ et al., 2007).

The diﬀerent schedules and corresponding linear parts of makespan (C

max

) of the same

example presented in previous section is presented in Figure 3.5. These schedules are

given according to the SPT rule. Here, λ also increases from zero to inﬁnity, where

= λ, p

= 1, p

= 2 and p

= 4.

Then, it is possible to say that SPT has mainly two properties (LEUNG; KELLY;

ANDERSON, 2004):

1. When the objective is the sum of completion times, the minimum value is

achieved by sequencing the jobs in the order of Shortest Processing Time -

This property articulates the virtue of shortest-ﬁrst sequencing. For any set of n

jobs, shortest-ﬁrst priority leads to a sequence that minimizes the value of



Thus, whenever the quality of a sequence is measured by the sum of completion

times, we can be sure that the appropriate sequencing rule is shortest ﬁrst.

2. When the objective is the sum of weighted completion times, the minimum

value is achieved by sequencing the jobs in the order of Shortest Weighted

Processing Time - This reveals that to reconcile the conﬂict between processing

time and importance weight, we should use the ratio of time to weight. Viewed an-

other way, the implication is that we should ﬁrst scale the processing times (divide

each one by its corresponding weight) and then apply the shortest-ﬁrst principle to

the scaled processing times.

Recall that a sequencing problem is deﬁned by a set of jobs and their processing times,

along with an objective. The choice of objective is a key step in specifying the problem,

and the use of the sum of completion times may sound specialized. However, SPT is

optimal for other measures as well.

First, note that Property 1 applies to the objective of the average completion time.

The average would be computed by dividing the sum of completion times by the (given)

number of jobs. In general, that would mean dividing by the constant n. Therefore,

whether we believe that the sum of completion times or the average of the completion

times more accurately captures the measurement of schedule eﬀectiveness, we can say

that SPT is optimal(CHAJED; LOWE, 2008).

Second, consider the objective of completing as many jobs as possible by a pre-

speciﬁed deadline. No matter what deadline is speciﬁed, SPT sequencing maximizes

the number of jobs completed by that time. This might not be surprising to anyone who

has tried to squeeze as many diﬀerent tasks as possible into a ﬁnite amount of time (LE-

UNG; KELLY; ANDERSON, 2004). Choosing the small tasks obviously works better

than choosing the large tasks, if our objective is to complete as many as possible. Taking

this observation to its limit, it follows that a shortest-ﬁrst sequence maximizes the number

of tasks completed.

Finally, consider the completion time as a measure of the delay experienced by an

individual customer. We could argue that the total delay has two parts: the wait for work

to start and the time of the actual work itself. The processing time is a function of the

order requested by the customer and is therefore likely to be under the customer’s control.

The wait until the work starts depends on the sequence chosen. Thus, we might prefer to

take as an objective the sum of the wait times. Although this seems at ﬁrst glance to be

a diﬀerent problem, it is not hard to see that the SPT is optimal for the sum of the wait

times (and the average of the wait times) as well (KOLEN et al., 1994).

The dual beneﬁts of SPT sequencing are fortuitous if we think in terms of the criteria

that often guide managers of manufacturing or service systems as they strive to satisfy

a given set of customers order. One type of concern is the response time that customers

experience. Customers would like delays in satisfying their orders to be short. If we

quantify this concern with the average completion time objective, an optimal sequencing

strategy is shortest-ﬁrst. Another type of concern is the pile of unﬁnished orders in in-

ventory, which managers would like to keep small. If we quantify this concern with the

average inventory objective, then shortest-ﬁrst remains an optimal strategy. In the context

of our simple sequencing problem, at least, the delay criterion and the inventory criterion

may sound quite diﬀerent, but they are actually diﬀerent sides of the same coin: both are

optimized by SPT sequencing. Thus, whether a manager focuses on customer delays or

on unﬁnished work in progress, or tries to focus on both at once, shortest-ﬁrst sequencing

is a good idea (KOLEN et al., 1994).

To summarize the three presented scheduling algorithms, Table 3.1 shows the objec-

tives presented in this chapter and which algorithm can be used for each objective. Note

that the List Scheduling can be used for almost all objectives, this happens because this

algorithm uses a priority list to order the jobs. Depending on the way the priority list is

constructed, the schedules produced by such heuristic may be quite good for any of the

objectives (KOLEN et al., 1994). The GA algorithm in Table 3.1 shows what objectives

can be optimized using GA. Our proposed GA do not optimize all of them.

Table 3.1: A summary of which algorithm can be used for each objective.

Objectives List Scheduling LPT SPT GA

makespan (C

max

) X X X

completion time X X

waiting time X X X

communication cost X

load balance X X X

maximize machine utilization X X X

minimize the total delay X X X

C2 C1

0 < 1≤ λ

makespan = 4

1 < 2≤ λ

makespan = 3 + λ

2 < 3≤ λ

makespan = 5

3 < 4≤ λ

makespan = 2 + λ

4 < 5≤ λ

makespan = 6

5 < 6≤ λ

makespan = 1 + λ

6 < 7≤ λ

makespan = 7

λ 7≥

makespan = λ

Figure 3.4: Example of LPT schedules.

0 < 1≤ λ

makespan = 5

1 < 2≤ λ

makespan = 4 + λ

2 < 3≤ λ

makespan = 6

3 < 4≤ λ

makespan = 6

λ 4≥

makespan = 2 + λ

Figure 3.5: Example of SPT schedules.

Chapter 4

Identifying Job Dependencies

This chapter introduces the concepts of Petri nets, shows an extended Petri net called

Coloured Petri Nets (CPN) and presents a model deﬁnition based on CPN to represent the

elements of MPhyScaS data and their relationship. This model will permit us to analyze

all levels of the architecture identifying parallel processes in each architecture level and

casual dependencies between the processes.

4.1 Petri Nets

Petri nets are an excellent formal model for studying concurrent and distributed sys-

tems and have been widely applied in many diﬀerent areas of computer science and

other disciplines (MURATA, 1989). Since Carl Adam Petri originally developed Petri

nets in 1962 (PETRI, 1962), Petri nets have evolved through four generations: the ﬁrst-

generation low-level Petri nets primarily used for modeling system control (REISIG,

1985), the second-generation high-level Petri nets for describing both system data and

control (JENSEN; ROZENBERG, 1991), the third-generation hierarchical Petri nets for

abstracting systemstructures (HE;LEE, 1991), and the fourth-generation objected-oriented

Petri nets for supporting modern system development approaches (AGHA; DE CINDIO;

ROZENBERG, 2001).

A Petri net is a particular kind of directed graph, together with an initial state called

the initial marking, M

. The underlying graph N of a Petri net is a directed, weighted,

bipartite graph consisting of two kinds of nodes, called places and transitions, where

arcs are either from a place to a transition or from a transition to a place. In graphical

representation, places are drawn as circles, transitions as bars or boxes. Arcs are labeled

with their weights (positive integers), where a k-weighted arc can be interpreted as the set

of k parallel arcs. Labels for unity weight are usually omitted. A marking (state) assigns

to each place a nonnegative integer. If a marking assigns to place p a nonnegative integer

k, we say that p is marked with k tokens. Pictorially, we place k black dots (tokens) in

place p. A marking is denoted by M, an m-vector, where m is the total number of places.

The pth component of M, denoted by M(p), is the number of tokens in place p.

In modeling, using the concept of conditions and events, places represent conditions,

and transitions represent events. A transition (an event) has a certain number of input

and output places representing the pre-conditions and post-conditions of the event, re-

spectively. The presence of a token in a place is interpreted as holding the truth of the

condition associated with the place. In another interpretation, k tokens are put in a place

to indicate that k data items or resources are available. The formal deﬁnition of a Petri net

is given in Deﬁnition 4.1.1 (MURATA, 1989).

Deﬁnition 4.1.1 (Petri net formal deﬁnition) A Petrinetis a 5-tuple, PN = (P, T, F, W, M

)

where:

P = {p

, p

, .. . , p

} is a ﬁnite set of places,

T = {t

, t

, .. . , t

} is a ﬁnite set of transitions,

F ⊆ (P × T) ∪ (T × P) is a set of arcs (ﬂow relation),

W : F → {1, 2, 3, ...} is a weight function,

: P → {0, 1, 2, 3, ...} is the initial marking,

P ∩ T = ∅ and P ∪ T  ∅.

A Petri net structure N = (P, T, F, W) without any speciﬁc initial marking is denoted

by N.

A Petri net with the given initial marking is denoted by (N, M

The behavior of many systems can be described in terms of system states and their

changes. In order to simulate the dynamic behavior of a system, a state or marking in a

Petri net is changed according to the following transition (ﬁring) rule:

1. A transition t is said to be enabled if each input place p of t is marked with at least

w(p, t) tokens, where w(p, t) is the weight of the arc from p to t.

2. An enabled transition may or may not ﬁre (depending on whether or not the event

actually takes place).

3. A ﬁring of an enabled transition t removes w(p, t) tokens from each input place p

of t, and adds w(t, p) tokens to each output place p of t, where w(t, p) is the weight

of the arc from t to p.

A transition without any input place is called a source transition, and one without any

output place is called a sink transition. Note that a source transition is unconditionally

enabled, and that the ﬁring of a sink transition consumes tokens, but does not produce

any.

A pair of a place p and a transition t is called a self-loop if p is both an input and

output place of t. A Petri net is said to be pure if it has no self-loops. A Petri net is said to

be ordinary if all of its arc weights are 1’s.

The above transition rule is illustrated in Figure 4.1 using the well-known chemical

reaction: 2H

+ O

→ 2H

O. Two tokens in each input place in Figure 4.1(a) show that

two units of H

and O

are available, and the transition t is enabled. After ﬁring t, the

marking will change to the one shown in Figure 4.1(b), where the transition t is no longer

enabled.

Petri nets have also been extended in many diﬀerent ways to study speciﬁc sys-

tem properties, such as performance, reliability, and schedulability. Well-known ex-

amples of extended Petri nets include timed Petri nets (WANG, 1998), stochastic Petri

nets (MARSAN et al., 1998), and coloured Petri nets (JENSEN, 1991; ROUBTSOVA,

2004; FUKUZAWA; SAEKI, 2002). Here, we present the coloured Petri nets which will

be used for determining dependencies between processes.

4.2 Coloured Petri Nets

Coloured Petri nets (CP-nets or CPNs) is a modeling language developed for systems in

which communication, synchronization and resource sharing play an important role. CP-

nets combine the strengths of ordinary Petri nets with the strengths of a high-level pro-

gramming language (ROUBTSOVA, 2004). Petri nets provide the primitives for process

interaction, while the programming language provides the primitives for the deﬁnition of

data types and the manipulations of data values.

H O

(a)

H O

(b)

Figure 4.1: An illustration of a transition (ﬁring) rule: (a) the marking before ﬁring the

enabled transition t; (b) the marking after ﬁring t, where t is disabled.

CPN models can be made with or without explicit reference to time. Untimed CPN

models are usually used to validate the functional/logical correctness of a system, while

timed CPN models are used to evaluate the performance of the system (FUKUZAWA;

SAEKI, 2002). There are many other languages which can be used to investigate the

functional/logical correctness of a system or the performance of it. However, it is rather

seldom to ﬁnd modeling languages that are well-suited for both kinds of analysis.

CP-nets also oﬀers more formal veriﬁcation methods, known as state space analysis

and invariant analysis. In this way it is possible to prove, in the mathematical sense of the

word, that a system has a certain set of behavioral properties.

There are three diﬀerent, but closely related, reasons to make CPN models (and other

kinds of behavioral models). First of all, a CPN model is a description of the modeled

system, and it can be used as a speciﬁcation (of a system which we want to built) or as a

presentation (of a system which we want to explain to other people, or ourselves). By cre-

ating a model we can investigate a new system before we construct it. This is an obvious

advantage, in particular for systems where design errors may compromise security or be

expensive to correct. Secondly, the behavior of a CPN model can be analyzed, either by

means of simulation (which is equivalent to program execution and program debugging)

or by means of more formal analysis methods (which are equivalent to program veriﬁ-

cation). Finally, it should be understood that the process of creating the description and

performing the analysis usually gives the modeler a dramatically improved understanding

of the modeled system, and it is often the case that this is more valid than the description

and the analysis results themselves (JENSEN, 1991).

Most of the advantages of CP-nets are subjective by nature and cannot be proved in

any formal way. It will be presented a list of advantages, accompanied by brief explana-

tions and justiﬁcations (JENSEN, 1997).

1. CP-nets have a graphical representation - The graphical form is intuitively very

appealing. It is extremely easy to understand and grasp, even for people who are not

very familiar with the details of CP-nets. This is due to the fact that CPN diagrams

resemble many of the informal drawings which designers and engineers make while

they construct and analyze a system.

2. CP-nets have a well-deﬁned semantics which unambiguously deﬁnes the be-

havior of each CP-net - It is the presence of the semantics which makes it possible

to implement simulators for CP-nets, and it is also the semantics which forms the

foundation for the formal analysis methods.

3. CP-nets are very general and can be used to describe a large variety of dif-

ferent systems - The applications of CP-nets range from informal systems (such

as the description of work processes) to formal systems (such as communication

protocols). They also range from software systems (such as distributed algorithms)

to hardware systems (such as VLSI chips). Finally, they range from systems with

a lot of concurrent processes (such as ﬂexible manufacturing) to systems with no

concurrency (such as sequential algorithms).

4. CP-nets have very few, but powerful, primitives - The deﬁnition of CP-nets is

rather short and it builds upon standard concepts which many system modelers

already know from mathematics and programming languages. This means that it

is relatively easy to learn to use CP-nets. However, the small number of primitives

also means that it is much easier to develop strong analysis methods.

5. CP-nets have an explicit description of both states and actions - This is in con-

trast to most system description languages which describe either the states or the

actions, but not both. Using CP-nets, it is easily possible to change the point of fo-

cus during the work. At some instances of time it may be convenient to concentrate

on the states (and almost forget about the actions) while at other instances it may be

more convenient to concentrate on the actions (and almost forget about the states).

6. CP-nets have a semantics which builds upon true concurrency, instead of in-

terleaving - This means that the notions of conﬂict, concurrency and casual depen-

dency can be deﬁned in a very natural and straightforward way. In an interleaving

semantics, it is impossible to have two actions in the same step, and thus concur-

rency only means that the actions can occur after each other, in any order.

7. CP-netsoﬀer hierarchical descriptions - This means that it is possible to construct

a large CP-net by relating smaller CP-nets to each other, in a well-deﬁned way. The

hierarchy constructs of CP-nets play a role similar to that of subroutines, procedures

and modules of programming languages, and it is the existence of hierarchical CP-

nets which makes it possible to model very large systems in a manageable and

modular way.

8. CP-nets integrate the description of control and synchronization with the de-

scription of data manipulation - This means that on a single sheet of paper it

can be seen what the environment, enabling conditions and eﬀects of an action are.

Many other graphical description languages work with graphs which only describe

the environment of an action, while the detailed behavior is speciﬁed separately

(often by means of unstructured prose).

9. CP-nets are stable towards minor changes of the modeled system - This is

proved by many practical experiences and it means that small modiﬁcations of the

modeled system do not completely change the structure of the CP-net. In particu-

lar, it should be observed that this is also true when a number of subnets describing

diﬀerent sequential processes are combined into a large CP-net. In many other

description languages, e.g., ﬁnite automata, such a combination often yields a de-

scription which it is diﬃcult to relate to the original sub-descriptions.

10. CP-nets oﬀer interactive simulations where the results are presented directly

on the CPN diagram - The simulation makes it possible to debug a large model

while it is being constructed, analogously to a good programmer debugging the

individual parts of a program as he ﬁnishes them. The colours of the moving tokens

can be inspected.

11. CP-nets have a large number of formal analysis methods by which proper-

ties of CP-nets can be proved - There are four basic classes of formal analysis

methods: construction of occurrence graphs (representing all reachable markings),

calculation and interpretation of system invariants (called place and transition in-

variants), reductions (which shrink the net without changing a certain selected set

of properties) and checking of structural properties (which guarantee certain behav-

ioral properties).

12. CP-nets have computer tools supporting their drawing, simulation and formal

analysis - This makes it possible to handle even large nets without drowning in de-

tails and without making trivial calculation errors. The existence of such computer

tools is extremely important for the practical use of CP-nets.

CP-nets can be analysed in four diﬀerent ways. The ﬁrst analysis method is interactive

simulation. It is very similar to debugging and prototyping. This means that we can

execute a CP-net model, to make a detailed investigation of the behaviour of the modelled

system. It is possible to set breakpoints and to visualise the simulation results by means

of diﬀerent kinds of graphics.

The second analysis method is automatic simulation which is similar program execu-

tion. It allows a fast execution of thousands or millions of transitions. The purpose is to

investigate the functional correctness of the system or to investigate the performance of

the system, e.g. to identify bottlenecks, to predict the use of buﬀer space or the mean/-

maximal service time (JENSEN, 1991).

The third analysis method is occurrence graphs (also called state spaces or reachabil-

ity graphs). The basic idea behind occurrence graphs is to construct a directed graph

which has a node for each reachable system state and an arc for each possible state

change. Obviously, such a graph may become very large, even for small CP-nets. How-

ever, it can be constructed and analysed totally automatically, and there exist techniques

which makes it possible to work with condensed occurrence graphs without losing ana-

lytic power (JENSEN, 1997). These techniques build upon equivalence classes.

The fourth analysis method is place invariants. This method is very similar to the

use of invariants in ordinary program veriﬁcation. The user constructs a set of equations

which is proved to be satisﬁed for all reachable system states. The equations are used to

prove properties of the modelled system.

4.3 MPhyScaS - Petri Net Model

As already stated, a simulation involving a set of diﬀerent physic phenomena demands a

great amount of time. In order to improve MPhyScaS simulation performance, a model

based on Petri nets is proposed for practical analysis of simulation and simulator data.

This model makes use of hierarchical parallelization concepts to identify which processes

(instances) of each MPhyScaS layer can be executed in parallel. The following section

presents this model.

Deﬁnition 4.3.1 (MPhyScaS Layers Architecture - Petri net model) Each layer in the

MPhyScaS architecture is modeled through a set of places, described as follows:

• P

is the set of places representing the Kernel (such set has a single place p

)

• P

is the set of places representing the Blocks

• P

alg

is the set of places representing the Algorithms

• P

is the set of places representing the Groups

• P

is the set of places representing the GroupTasks

• P

is the set of places representing the Phenomena

• P

is the set of places representing the Quantities

• P

is the set of places representing the States

In Chapter 2 we said that the Kernel represents the overall scenery of the simulation

and it stores system data related to the parameters for its loops and iterations. We also said

that in a simulation only one Kernel can exists. Hence, the relationship between Block

and Kernel is as presented in Deﬁnition 4.3.2.

Deﬁnition 4.3.2 (Block and Kernel Relationship - Petri Net Model) Let T

be a set of

transitions, such that:



∈T

{O(t

)} = P



∈T

{I(t

)} = P

3. #P

= #T

4. let p

, p

∈ P

, p

 p

, ∄t

∈ T

|(p

∈

•

) ∧ (p

∈

•

)

Assuming P

= {p

}, P

= {p

, p

Based on Deﬁnition 4.3.2, Petri net PN

creates a valid relationship between Block

and Kernel (see the graphical representation in Figure 4.2(a)). Similarly, Petri net PN

violates rule 4 of Deﬁnition 4.3.2. Here, transition t

bk2

has both input places p

and p

A graphical representation of Petri nets PN

and PN

is depicted in Figure 4.2.

= { PN

= {

, p

} {p

, p

}

bk1

, t

bk2

, t

bk3

} {t

bk1

, t

bk2

, t

bk3

}

{{p

}, {p

}} {{p

}, {p

, p

}, {}}

{{p

}, {p

}} {{p

}, {p

}}

{} {}

{0, 0, 0, 0} {0, 0, 0, 0}

} }

The rules in Deﬁnition 4.3.2 exist because of the Petri net must represent the rela-

tionships correctly. Petri net PN

does not represent all relationships correctly. Beside of

Petri net PN

does not represent a request from the Kernel (because in Petri net PN

the

Kernel asks to Blocks 2 and 3 using only one request for executing something), the way

represented in Petri net PN

would not permit us to analyze each path that one request go

through.

Deﬁnition 4.3.3 (Algorithm and Block Relationship - Petri Net Model) Let T

algb

be a

set of transitions, such that:



algb

∈T

algb

{O(t

algb

)} = P

(a)

(b)

Figure 4.2: Graphical representation of Petri nets PN

and PN



algb

∈T

algb

{I(t

algb

)} = P

alg

3. #P

alg

= #T

algb

4. let p

alg1

, p

alg2

∈ P

alg

, p

alg1

 p

alg2

, ∄t

algb

∈ T

algb

|(p

alg1

∈

•

algb

) ∧ (p

alg2

∈

•

algb

)

5. let p

, p

∈ P

, p

 p

, ∄t

algb

∈ T

algb

|(p

∈ t

•

algb

) ∧ (p

∈ t

•

algb

)

Assuming P

= {p

, p

}, P

alg

= {p

alg1

, p

alg2

, p

alg3

On the basis of Deﬁnition 4.3.3, a validPetri net is represented by PN

which is graph-

ical represented on Figure 4.3(a). Petri net PN

violates rules 4 and 5 of Deﬁnition 4.3.3.

Here, there are two input places (p

alg1

and p

alg2

) for the transition t

algb1

and two output

places (p

and p

) to the transition t

algb3

= { PN

= {

, p

alg1

, p

alg2

, p

alg3

} {p

, p

alg1

, p

alg2

, p

alg3

}

algb1

, t

algb2

, t

algb3

} {t

algb1

, t

algb2

, t

algb3

}

{{p

alg1

}, {p

alg2

}, {p

alg3

}} {{p

alg1

, p

alg2

}, {}, {p

alg3

}}

{{p

}, {p

}} {{p

}, {p

, p

}}

{} {}

{0, 0, 0, 0, 0} {0, 0, 0, 0, 0}

} }

alg3

alg1

alg2

(a)

alg3

alg1

alg2

algb1

algb2

algb3

(b)

Figure 4.3: Graphical representation of Petri nets PN

and PN

Beside of Petri net PN

does not represent a request form the Block (because in Petri

net PN

the Block 1 asks to Algorithms 1 and 2 using only one request for executing

something), the way represented in Petri net PN

would not permit us to analyze each

path that one request go through.

Deﬁnition 4.3.4 (Group and Algorithm Relationship - Petri Net Model) Let T

galg

be a

set of transitions, such that:



galg

∈T

galg

{O(t

galg

)} = P

alg



galg

∈T

galg

{I(t

galg

)} = P

3. #P

<= #T

galg

4. let p

, p

∈ P

, p

 p

, ∄t

galg

∈ T

galg

|(p

∈

•

galg

) ∧ (p

∈

•

galg

)

5. let p

alg1

, p

alg2

∈ P

alg

, p

alg1

 p

alg2

, ∄t

galg

∈ T

galg

|(p

alg1

∈ t

•

galg

) ∧ (p

alg2

∈ t

•

galg

)

6. let p

∈ P

, p

alg1

∈ P

alg

, p

∈ P

, ∀t

galg

∈ p

•

alg1

∈ t

•

galg

∧ p

alg1

∈

•

algb

•

algb

= {p

}

Assuming P

= {p

}, P

alg

= {p

alg1

, p

alg2

}, P

= {p

, p

Petri net PN

shows an example of a valid Petri net on the basis of Deﬁnition 4.3.4.

The graphical representation is depicted in Figure 4.4(a).

= {

, p

alg1

, p

alg2

, p

}

algb1

, t

algb2

, t

galg1

, t

galg2

}

{{p

alg1

}, {p

alg2

}, {p

}}

{{p

}, {p

alg1

}, {p

alg2

}}

{}

{0, 0, 0, 0}

}

It is possible to see in Petri net PN

the violation of rules 4 and 5 of Deﬁnition 4.3.4.

The transition t

galg2

has two input places (p

and p

) and the transition t

galg1

has two

output places (p

alg1

and p

alg2

), which characterize the violation, respectively.

Petri net PN

is an example of rule 6 violation in Deﬁnition 4.3.4. Here, p

pro-

vides support to Algorithms of diﬀerent Blocks (the Algorithms represented by the places

alg1

and p

alg2

belongs to the Blocks represented by the places p

and p

respectively).

Figure 4.4 shows the graphical representation of Petri nets PN

, PN

and PN

= PN

= {

alg1

, p

alg2

, p

alg3

, p

} {p

, p

alg1

, p

alg2

, p

}

galg1

, t

galg2

, t

galg3

} {t

algb1

, t

algb2

, t

galg1

, t

galg2

}

{{p

}, {p

, p

}, {}} {{p

alg1

}, {p

alg2

}, {p

}}

{{p

alg1

, p

alg2

}, {p

alg2

}, {p

alg3

}} {{p

}, {p

alg1

}, {p

alg2

}}

{} {}

{0, 0, 0, 0, 0, 0} {0, 0, 0, 0, 0}

} }

Beside of Petri net PN

does not represent a request from the Algorithm (because in

Petri net PN

the Algorithm 2 asks to Groups 2 and 3 using only one request for executing

something), the way represented in Petri net PN

would not permit us to analyze each

path that one request go through. Another error in Petri net PN

representation is that

Algorithms 1 and 2 ask simultaneously to Group 1 for executing. This is impossible in

MPhyScaS architecture deﬁnition.

In Petri net PN

two diﬀerent Algorithms (Algorithms 1 and 2) belonging to diﬀerent

Blocks (Blocks 1 and 2 respectively) ask for the execution of the same Group (Group 1).

This is also impossible in MPhyScaS architecture deﬁnition.

alg2

algb2

alg1

algb1

galg1

galg2

(a)

alg2

alg1

galg1

galg2

alg3

galg3

(b)

alg2

algb2

alg1

algb1

galg1

galg2

(c)

Figure 4.4: Graphical representation of Petri nets PN

, PN

and PN

Deﬁnition 4.3.5 (GroupTask and Group Relationship - Petri Net Model) LetT

gtg

be a

set of transitions, such that:



gtg

∈T

gtg

{O(t

gtg

)} = P



gtg

∈T

gtg

{I(t

gtg

)} = P

3. #P

= #T

gtg

4. let p

gt1

, p

gt2

∈ P

, p

gt1

 p

gt2

, ∄t

gtg

∈ T

gtg

|(p

gt1

∈

•

gtg

) ∧ (p

gt2

∈

•

gtg

)

5. let p

, p

∈ P

, p

 p

, ∄t

gtg

∈ T

gtg

|(p

∈ t

•

gtg

) ∧ (p

∈ t

•

gtg

)

Assuming P

= {p

, p

}, P

= {p

gt1

, p

gt2

, p

gt3

One can note that Petri net PN

is valid on the basis of Deﬁnition 4.3.5 (see the graphi-

cal representation on Figure 4.5(a)), and PN

violates rules 4 and 5 of the same deﬁnition.

In the ﬁrst violation of PN

, the transition t

gtg1

has two input places (p

gt1

and p

gt2

), and

in the second one, the transition t

gtg3

has two output places (p

and p

). The graphical

representation of Petri nets PN

and PN

are depicted in Figure 4.5.

= { PN

= {

, p

gt1

, p

gt2

, p

gt3

} {p

, p

gt1

, p

gt2

, p

gt3

}

gtg1

, t

gtg2

, t

gtg3

} {t

gtg1

, t

gtg2

, t

gtg3

}

{{p

gt1

}, {p

gt2

}, {p

gt3

}} {{p

gt1

, p

gt2

}, {}, {p

gt3

}}

{{p

}, {p

}} {{p

}, {p

, p

}}

{} {}

{0, 0, 0, 0, 0} {0, 0, 0, 0, 0}

} }

gt3

gt1

gt2

gtg1

gtg2

gtg3

(a)

gt3

gt1

gt2

gtg1

gtg2

gtg3

(b)

Figure 4.5: Graphical representation of Petri nets PN

and PN

Beside of Petri net PN

does not represent a request from the Group (because in Petri

net PN

the Group 1 asks to GroupTasks 1 and 2 using only one request for executing

something), the way represented in Petri net PN

would not permit us to analyze each

path that one request go through.

Deﬁnition 4.3.6 (Phenomenon and GroupTask Relationship - Petri Net Model) Let T

phgt

be a set of transitions, such that:



phgt

∈T

phgt

{O(t

phgt

)} = P



phgt

∈T

phgt

{I(t

phgt

)} = P

3. #P

<= #T

phgt

4. let p

ph1

, p

ph2

∈ P

, p

ph1

 p

ph2

, ∄t

phgt

∈ T

phgt

|(p

ph1

∈

•

phgt

) ∧ (p

ph2

∈

•

phgt

)

5. let p

gt1

, p

gt2

∈ P

, p

gt1

 p

gt2

, ∄t

phgt

∈ T

phgt

|(p

gt1

∈ t

•

phgt

) ∧ (p

gt2

∈ t

•

phgt

)

6. let p

∈ P

, p

gt1

∈ P

, p

ph1

∈ P

, ∀t

phgt

∈ p

•

ph1

gt1

∈ t

•

phgt

∧ p

gt1

∈

•

gtg

•

gtg

= {p

}

Assuming P

= {p

}, P

= {p

gt1

, p

gt2

}, P

= {p

ph1

, p

ph2

On the basis of Deﬁnition 4.3.6, Petri net PN

(see Figure 4.6(a)) is valid.

= {

, p

gt1

, p

gt2

, p

ph1

, p

ph2

}

gtg1

, t

gtg2

, t

phgt1

, t

phgt2

, t

phgt3

}

{{p

gt1

}, {p

gt2

}, {p

ph1

}, {p

ph2

}, {p

ph2

}}

{{p

}, {p

gt1

}, {p

gt1

}, {p

gt2

}}

{}

{0, 0, 0, 0, 0}

}

It is possible to note that in Petri net PN

there is a transition (t

phgt1

) that has two

input places (p

ph1

and p

ph2

). Similarly, the transition t

phgt3

has two output places (p

gt1

and

gt2

). Petri net PN

violates rules 4 and 5 of Deﬁnition 4.3.6 respectively.

Petri net PN

violates rule 6 of Deﬁnition 4.3.6. Here, p

ph1

provides support to two

GroupTasks that belongs to diﬀerent Groups (GroupTasks represented by places p

gt1

and

gt2

belongs to Groups represented by the places p

and p

respectively). Petri nets

, PN

and PN

are depicted in Figure 4.6.

= { PN

= {

gt1

, p

gt2

, p

ph1

, p

ph2

, p

ph3

} {p

, p

gt1

, p

gt2

, p

ph1

}

phgt1

, t

phgt2

, t

phgt3

} {t

gtg1

, t

gtg2

, t

phgt1

, t

phgt2

}

{{p

ph1

, p

ph2

}, {}, {p

ph3

}} {{p

gt1

}, {p

gt2

}, {p

ph1

}, {p

ph1

}}

{{p

gt1

}, {p

gt1

}, {p

gt1

, p

gt2

}} {{p

}, {p

gt1

}, {p

gt2

}}

{} {}

{0, 0, 0, 0, 0} {0, 0, 0, 0, 0}

} }

gt2

gtg2

gt1

gtg1

phgt2

phgt3

ph2

phgt1

ph1

(a)

gt1

ph3

ph1

ph2

gt2

phgt1

phgt2

phgt3

(b)

gt2

gtg2

gt1

gtg1

phgt1

phgt2

ph1

(c)

Figure 4.6: Graphical representation of Petri nets PN

, PN

and PN

Beside of Petri net PN

does not represent a request form the GroupTask (beacuse

in Petri net PN

the GroupTask 1 asks to Phenomenon 1 and 2 using only one request

for executing something), the way represented in Petri net PN

would not permit us to

analyze each path that one request go through.

In Petri net PN

two diﬀerent GroupTasks (GroupTasks 1 and 2) belonging to diﬀer-

ent Groups (Groups 1 and 2 respectively) ask for the execution of the same Phenomenon

(Phenomenon 1). This is impossible in MPhyScaS architecture deﬁnition.

Deﬁnition 4.3.7 (Quantity and Phenomenon Relationship - Petri Net Model) Let T

qph

be a set of transitions, such that:



qph

∈T

qph

{O(t

qph

)} = P



qph

∈T

qph

{I(t

qph

)} = P

3. #P

= #T

qph

4. let p

, p

∈ P

, p

 p

, ∄t

qph

∈ T

qph

|(p

∈

•

qph

) ∧ (p

∈

•

qph

)

5. let p

ph1

, p

ph2

∈ P

, p

ph1

 p

ph2

, ∄t

qph

∈ T

qph

|(p

ph1

∈ t

•

qph

) ∧ (p

ph2

∈ t

•

qph

)

Assuming P

= {p

ph1

, p

ph2

}, P

= {p

, p

Based on Deﬁnition 4.3.7, Petri net PN

creates a valid relationship between Quantity

and Phenomenon (see the graphical representation in Figure 4.7(a)), and PN

violates

rules 4 and 5 of the same deﬁnition. In PN

, the transition t

qph1

has two input places (p

and p

) and the transition t

qph3

has two output places (p

ph1

and p

ph2

= { PN

= {

ph1

, p

ph2

, p

} {p

ph1

, p

ph2

, p

}

qph1

, t

qph2

, t

qph3

} {t

qph1

, t

qph2

, t

qph3

}

{{p

}, {p

}} {{p

, p

}, {}, {p

}}

{{p

ph1

}, {p

ph1

}, {p

ph2

}} {{p

ph1

}, {p

ph1

}, {p

ph1

, p

ph2

}}

{} {}

{0, 0, 0, 0, 0} {0, 0, 0, 0, 0}

} }

Beside of Petri net PN

does not represent a request form the Phenomenon (beacuse

in Petri net PN

the Phenomenon 1 asks to Quantities 1 and 2 using only one request

ph1

ph2

qph1

qph2

qph3

(a)

ph1

ph2

qph1

qph2

qph3

(b)

Figure 4.7: Graphical representation of Petri nets PN

and PN

for executing something), the way represented in Petri net PN

would not permit us to

analyze each path that one request go through.

Quantities are responsible for assembling global states (matrices, vectors and scalars),

so that there will be a connection form the global state to the quantity when this quantity

is responsible for the assembling of this global state. In order to characterize a coupling,

when it exists, there will be a connection from the quantity to the global state. This

characterize that the quantity needs that the global state is already assembled. Therefore,

the global states are the ones which are at the bottom of our model.

4.4 Model Analysis

From the model deﬁned, we realize that the tokens should be valued because they repre-

sent the task that is executing at that moment. Therefore, we reﬁned the model to become

a Coloured Petri Net (CPN) model. For this purpose, a new place is ﬁrst created and it is

connected through a transition to all places that represent global states (the places at the

bottom of the model). This new place will contain a token, which will be the single token

in the net we created. Secondly, all transitions will have a restriction so that only tokens

with a certain value can ﬁre them. The token value allowed to ﬁre a given transition is

a string created through the composition of all place names in the path from the highest

level element (Kernel) until the transition. For example, the transition that restricts the

token to be 0.2.1 is the transition between Block 2 and Algorithm 1, where Block 2 is

related to the kernel which has code 0.

When the model becomes a CP-net, we use the CPNTools (CPNGROUP, 2009) to

analyze it. We use the occurrence graphs method (also called state spaces or reachabil-

ity graphs) to realize the analysis. In other words, before beginning the analysis, it is

necessary to generate the state space of the net. Then, the analysis is as presented in

Pseudo-Code 2.

Pseudo-Code 2: Model analysis pseudo-code.

foreach node of the state Space do1

boolean dependence = Verify Dependence(node) //Verify if there2

are more than one tokens and if they are at the same level

if dependence then3

Write In File(node)4

end5

end6

Line 1 of Algorithm 2 shows that each node of the state space, i.e. each reachable

system state will be analyzed. In line 2 it is veriﬁed if there is any dependence in the node.

For this, it is veriﬁed if there are more than one token in the net and if the tokens are at

the same level of MPhyScaS architecture. If a dependence exists, the node indicating the

places that contains the token will be writen in a ﬁle (lines 3 to 5).

To illustrate the analysis done, Figure 4.8 shows a part of a Petri net that represents

the execution of two GroupTasks. GroupTask 0 (GT 0) requires Phenomenon 0 (Ph 0) to

execute Quantity 0 (q 0) which assembles the global state 0 (S 0). In the same Petri net,

GroupTask 1 (GT 1) requires Phenomenon 0 (Ph 0) to execute Quantity 1 (q 1) which

needs that global state 0 (S 0) is already assembled to be able to assemble the global state

1 (S 1).

In order to make it easier the model analysis, we invert the direction of each arc in

the Petri net as shown in Figure 4.9. This helps us to identify the entire path which

a requirement of a task execution passed through. In this way, the analysis begins in

the execution of a task (State level) and goes through the levels following a bottom-up

direction until it reachhes the top level (Kernel level).

The ﬁrst task to execute is represented by a token with 0.0.0.0 value. This token is in

GT 0 GT 1

Ph 0

q 0 q 1

S 0 S 1

Figure 4.8: An example of a Petri net that represents the execution of two GroupTasks.

GT 0 GT 1

Ph 0

q 0 q 1

S 0 S 1

Figure 4.9: The Petri net with arcs direction inverted.

the place that represents the global state 0 (S 0) because this is the global state assembled

by the task 0.0.0.0 which is the ﬁrst task of the simulation (see Figure 4.10(a)). This

marking enables the transition that makes the token go to the place that represents the

Quantity 0. This is the quantity that assembles the global state 0.

When the token is in the place that represents the Quantity 0 (Figure 4.10(b)), it en-

ables the transition that makes the token go to the place that represents the Phenomenon 0.

In the next step, the tokenis in the place that represents the Phenomenon 0 (as depicted

in Figure 4.10(c)). It enables the transition that makes the token go to the place that

represents GroupTask 0.

The analysis continues until Kernel level is reached. When this occurs, the token

GT 0 GT 1

Ph 0

q 0 q 1

S 0 S 1

token = 0.0.0.0

(a)

GT 0 GT 1

Ph 0

q 0 q 1

S 0 S 1

token = 0.0.0.0

(b)

GT 0 GT 1

Ph 0

q 0 q 1

S 0 S 1

token = 0.0.0.0

(c)

Figure 4.10: The Petri net representing the execution of the ﬁrst executed task.

assumes the value of the next task of the simulation. In the case of the example, the token

assumes 1.1.0.1 value. It is placed in the global state 1 which is the next global state to be

assembled in the simulation. Figure 4.11(a) shows the token in the place that represents

global state 1. In this case, the token enables the transition that makes the token go to the

place that represents Quantity 1.

When the transition is ﬁred and the token is in Quantity 1 place, the token enables two

transitions to ﬁre. The ﬁrst transition is the one which makes the token go to place that

represents Phenomenon 0. The second transition is the one which makes the token go to

place that represents the global state 0. Figure 4.11(b) depicts this situation.

The dependencies are identiﬁed when a token enables two transitions. In this case, we

verify all state space constructed from this moment. The analysis search for objects that

belong to level that executes something. This levels are Kernel, Algorithm, QuantityTask,

GroupTask and Quantity levels. The other levels only forward a requirement, executing

nothing. There is a dependence between the objects found of the same level. The depen-

dence occurs so that the object that came from an already executed task must be executed

before the other one. In the example, the analysis ﬁnds two dependences: Quantity 1

depends on the Qauntity 0 for executing; and GroupTask 1 depends on the GroupTask 0

for executing.

In the Coloured Petri Net generated, each transition restricts the token value that can

GT 0 GT 1

Ph 0

q 0

q 1

S 0 S 1

token = 1.1.0.1

(a)

GT 0 GT 1

Ph 0

q 0 q 1

S 0 S 1

token = 1.1.0.1

(b)

Figure 4.11: The Petri net representing the execution of the second executed task.

enable and ﬁre itself. This is important because the analysis do not identify dependences

if an object depends on another one and this last one depends on another one. In other

words, the analysis do not identify dependence between the ﬁrst object and the third one.

Chapter 5

Proposed Algorithm

This chapter introduces the concepts of Genetic Algorithms (GA) and presents a new

GA approach. This new approach uses the dependencies found in CPN analysis and

also considers some new eﬀects in scheduling process, concerned in solving MPhyScaS

scheduling problem

5.1 Genetic Algorithms

Evolutionary computation (EC) has become a standard term to indicate problem-solving

techniques which use design principles inspired from models of the natural evolution of

species.

Historically, there are three main algorithmic developments within the ﬁeld of EC:

evolutionstrategies(BEYER, 2001), evolutionaryprogramming (ENGELBRECHT, 2007),

and genetic algorithms (COLEY, 1998). Common to these approaches is that they are

population-based algorithms that use operators inspired by population genetics to explore

the search space (DORIGO; STüTZLE, 2004). Each individual in the algorithm repre-

sents directly or indirectly a solution to the problem under consideration.

Diﬀerences among the diﬀerent EC algorithms concern the particular representations

chosen for the individuals and the way genetic operators are implemented. For example,

genetic algorithms typically use binary or discrete valued variables or represent informa-

tion in individuals and they favor the use of recombination, while evolution strategies

and evolutionary programming often use continuous variables and put more emphasis on

the mutation operator. Nevertheless, the diﬀ erences between the diﬀerent paradigms are

becoming more and more blurred (DORIGO; STüTZLE, 2004).

A general outline of an EC algorithm in given in Pseudo-Code 3, where pop denotes

the population of individuals. To deﬁne an EC algorithm the following functions have to

be speciﬁed:

• The function Initialize Population, which generates the initial population;

• The function Evaluate Population, which computes the ﬁtness values of the

individuals;

• The function Best Of Population, which returns the best individual in the cur-

rent population;

• The function Select and Recombine, which repeatedly selects two or more in-

dividuals and combine them to form one or more new individuals;

• The function Mutate, which introduces a small random perturbation, when applied

to one individual;

Pseudo-Code 3: Evolutionary computation algorithm pseudo-code.

pop ← Initialize Population1

Evaluate Population(pop)2

sbest ← Best Of Population(pop)3

while termination condition not met do4

pop’ ← Select And Recombine(pop)5

pop " ← Mutate(pop’)6

Evaluate Population(pop ")7

s ← Best Of Population(pop ")8

if f(s) < f(sbest)9

then10

sbest ← s11

end12

pop ← pop "13

end14

return sbest15

The basicprinciples ofGA were ﬁrst proposed by Holland(HOLLAND, 1975). There-

after, a series of literature and reports became available. GA is inspired by the mechanism

of natural selection where stronger individuals are likely the winners in a competing envi-

ronment. Here, GA uses a direct analogy of such natural evolution. Through the genetic

evolution method, an optimal solution can be found and represented by the ﬁnal winner

of the genetic game.

GA presumes that the potential solution of any problem is an individual and can be

represented by a set of parameters. These parameters are regarded as the genes of a chro-

mosome and can be structured by a string of values in binary form. A positive value,

generally know as a ﬁtness value, is used to reﬂect the degree of goodness of the chro-

mosome for the problem which would be highly related with its objective value (MAN;

TANG; KWONG, 1999).

Throughout a genetic evolution, the ﬁtter chromosome has a tendency to yield good

quality oﬀspring which means a better solution to any problem (BANZHAF et al., 1998).

In a practical GA application, a population pool of chromosomes has to be installed and

these can be randomly set initially. The size of this population varies from one problem to

another (LOBO; LIMA, 2005). In each cycle of genetic operation, termed as an evolving

process, a subsequent generation is created from the chromosomes in the current popula-

tion. This can only succeed if a group of these chromosomes, generally called parents, is

selected via a speciﬁc selection routine. The genes of the parents are mixed and recom-

bined for the production of oﬀspring in the next generation. It is expected that from this

process of evolution (manipulation of genes), the better chromosome will create a large

number of oﬀspring, and thus has a higher chance of surviving in the subsequent gen-

eration, emulating the survival-of-ﬁttest mechanism in nature (MAN; TANG; KWONG,

1999).

The cycle of evolution is repeated until a desired termination criterion is reached.

This criterion can also be set by a number of evolution cycles (computational runs), or the

amount of variation of individuals between diﬀerent generations, or a pre-deﬁned value

of ﬁtness.

5.1.1 Initial Population

The major questions to consider are ﬁrstly the size of the population, and secondly the

method by which the individuals are chosen. The choice of the population size has been

approached from several theoretical points of view, although the underlying idea is always

of a trade-oﬀ between eﬃciency and eﬀectiveness. Intuitively, it would seem that there

should be some optimal value for a given string length, on the grounds that too small

a population would not allow suﬃcient room for exploring the search space eﬀectively,

while too large a population would so impair the eﬃciency of the method that no solution

could be expected in a reasonable amount of computation (REEVES; YAMADA, 1998).

5.1.2 Selection

During each successive generation, a proportion of the existing population is selected to

breed a new generation. Individual solutions are selected through a ﬁtness-based process,

where ﬁtter solutions (as measured by a ﬁtness function) are typically more likely to be

selected. Certain selection methods rate the ﬁtness of each solution and preferentially

select the best solutions. Other methods rate only a random sample of the population, as

this process may be very time-consuming (BANZHAF et al., 1998).

Most functions are stochastic and designed so that a small proportion of less ﬁt so-

lutions are selected. This helps keep the diversity of the population large, preventing

premature convergence on poor solutions (REZENDE, 2002). Popular and well-studied

selection methods include ﬁtness-proportionate selection (also known as roulette wheel

selection) and tournament selection.

5.1.2.1 Fitness-Proportionate Selection

Fitness-proportionate selection, also known as roulette wheel selection, is a genetic opera-

tor used in genetic algorithms for selecting potentially useful solutions for recombination.

In ﬁtness-proportionate selection, as in all selection methods, the ﬁtness function as-

signs a ﬁtness to possible solutions or chromosomes. This ﬁtness level is used to associate

a probability of selection with each individual chromosome (see Figure 5.1). If f

is the

ﬁtness of individual i in the population, its probability of being selected is



j=1

where N is the number of individuals in the population. While candidate solutions

with a higher ﬁtness will be less likely to be eliminated, there is still a chance that they

may be (REZENDE, 2002). Contrast this with a less sophisticated selection algorithm,

such as truncation selection, which will eliminate a ﬁxed percentage of the weakest candi-

dates. With ﬁtness-proportionate selection there is a chance some weaker solutions may

survive the selection process; this is an advantage, as though a solution may be weak,

it may include some component which could prove useful following the recombination

process (BANZHAF et al., 1998).

Figure 5.1: Fitness-proportionate selection example.

The analogy to a roulette wheel can be envisaged by imagining a roulette wheel in

which each candidate solution represents a pocket on the wheel; the size of the pockets

are proportionate to the probability of selection of the solution. Selecting N chromo-

somes from the population is equivalent to playing N games on the roulette wheel, as

each candidate is drawn independently (BANZHAF et al., 1998).

5.1.2.2 Tournament Selection

Tournament selection is one of many methods of selection in genetic algorithms which

runs a tournament among a few individuals chosen at random from the population and

selects the winner (the one with the best ﬁtness) for crossover easily adjusted by changing

the tournament size. If the tournament size is larger, weak individuals have a smaller

chance to be selected. Figure 5.2 gives an example where four individuals are selected to

the tournament.

Deterministic tournament selection selects the best individual (when p = 1) in any

tournament. A 1-way tournament (k = 1) selection is equivalent to random selection.

Figure 5.2: Tournament selection example.

The chosen individual can be removed from the population that the selection is made

from if desired, otherwise individuals can be selected more than once for the next gener-

ation (REEVES; YAMADA, 1998).

Tournament selection has several beneﬁts: it is eﬃcient to code, works on parallel

architectures and allows the selection pressure to be easily adjusted (ZALZALA; FLEM-

ING, 1997).

5.1.3 Crossover

Crossover is also known as recombination. In the crossover operation exchanging infor-

mation among the strings present in the population creates new strings (solutions). Recall

that a string representation of a solution contains information in its genes. In crossover,

two strings are picked from the population and some portions of these strings are ex-

changed between them.

A common implementation of crossover uses the single-point crossover process in

which a crossing site is randomly chosen along the string length and all bits to the right

side of this crossing site are exchanged between the two parent strings (BAGCHI, 1999),

as shown in Figure 5.3.

For multipoint crossover, m crossover positions, k

∈ {1, 2, ..., l − 1}, where k

are the

crossover points and l is the length of the chromosome, are chosen at random with no

duplicates and sorted into ascending order. Then, the bits between successive crossover

points are exchanged between the two parents to produce two new oﬀspring. The section

between the ﬁrst allele position and the ﬁrst crossover point is not exchanged between

individuals (ZALZALA; FLEMING, 1997). This process is illustrated in Figure 5.4.

The idea behind multipoint, and indeed many of the variations on the crossover op-

parents

offspring

1 0 0 0 1 10 0 0 1 0 1 0 1

1 0 0 1 0 10 0 0 1 0 0 1 1

Figure 5.3: Single-point crossover example.

parents

offspring

1 0 0 0 1 10 0 0 1 0 1 0 1

1 0 0 1 1 10 0 0 1 0 0 0 1

Figure 5.4: Multipoint crossover example.

erator, is that the parts of the chromosome representation that contribute most to the per-

formance of a particular individual may not necessarily be contained in adjacent sub-

strings (COLEY, 1998). Further, the disruptive nature of mutipoint crossover appears

to encourage the exploration of the search space, rather than favouring convergence to

highly ﬁt individuals early in the search, thus making the search more robust (ZALZALA;

FLEMING, 1997).

Single and multipoint crossover deﬁne cross points as places between loci where a

chromosome can be spit. Uniform crossover (SYWERDA, 1989) generalises this scheme

to make every locus a potential crossover point. A crossover mask, the same length as

the chromosome structures, is created at random and the parity of the bits in the mask

indicates which parent will supply the oﬀspring with each bits.

Uniform crossover, like multipoint crossover, has been claimed to reduce the bias

associated with the length of the binary representation used and the particular coding for

a given parameter set. This helps to overcome the bias in single-point crossover towards

short substrings without requiring precise understanding of the signiﬁcance of individual

bits in the chromosome representation (ZALZALA; FLEMING, 1997). It does not use

crossover points, but determine a mask which indicates from one or the other parent the

genes will be inherited. In Figure 5.5, the value 1 in mask indicates that the corresponding

gene of the ﬁrst parent will be inherited by the ﬁrst oﬀspring, and the corresponding gene

of the second parent will be inherited by the second oﬀspring. For the value 0 in mask,

the reverse occurs.

parents

offspring

1 0 0 0 1 10 0 0 1 0 1 0 1

1 1 0 1 0 10 0 0 0 0 0 1 1

mask:

0 1 0 1 0 0 0

Figure 5.5: Uniform crossover example.

It is expected from the above operation that if good substrings from the parents get

combined by crossover, the children are likely to have improved ﬁtness. Further, because

a pressure is exerted by survival-of-the-ﬁttest selection method using in forming the pop-

ulation, when good strings are created in the progenies by crossover, they help the search

process by propagatingtheir superiorcharacteristics in the ensuinggenerations (BAGCHI,

1999). However, since the location of the appropriate (i.e. best) site of crossover cannot

be known a priori without considerable additional computation, a random site is chosen

in practice (COLEY, 1998).

Still, the eﬀect of crossover can be detrimental or beneﬁcial. Therefore, to preserve

the good strings in the population, not all strings in the population are used in crossover.

A crossover probability (p

) is used to decide whether a given member of the population

will be crossed.

5.1.4 Mutation

While a crossoveroperator attempts to produce newstrings of superior ﬁtness by eﬀecting

large changes in a string’s makeup, the need for local search around a current solution

also exists. This is accomplished by mutation. Mutation is additionally aimed to maintain

diversity in the population (BAGCHI, 1999).

In natural evolution, mutation is a random process where one allele of a gene is re-

placed by another to produce a new genetic structure. In GAs, mutation is randomly

applied with low probability and modiﬁes elements in the chromosomes. Usually consid-

ered as a background operator, the role of mutation is often seen as providing a guarantee

that the probability of searching any given string will never be zero and acting as a safety

net to recover good genetic material which may be lost through the action of selection and

crossover (ZALZALA; FLEMING, 1997).

In practice, for example, if binary code is being used, a single bit in a string may

be altered (from 0 to 1, or 1 to 0) with a small probability, creating a new solution (see

Figure 5.6).

1 0 0 0 1 10

1 0 1 0 1 10

before mutation

after mutation

Figure 5.6: Mutation example.

Crossover aims at recombining parts of goods substrings from good parent strings to

hopefully create a better oﬀspring. Mutation, on the other hand, alters a single child string

locally to hopefully create a superior child string.

5.1.5 Elitism

Fitness-proportionate selection does not guarantee the selection of any particular individ-

ual, including the ﬁttest. Even then the ﬁttest individual is much ﬁtter than any other,

it will occasionally not be selected. To not be selected is to die. Thus with ﬁtness-

proportionate selection, the best solution to the problem discovered so far can be regularly

thrown away.

Although it appears counterproductive, this can be advantageous for some problems

because it slows the algorithm, allowing it to explore more of the search space before

convergence (COLEY, 1998). This balance between the exploration of the search space

and the exploitation that is made the faster the progress of the algorithm, but the greater

the possibilityof the algorithm failing to ﬁnally locate the true global optimum (BAGCHI,

1999).

For manyapplications the search speed can be greatly improved by not losing the best,

or elite, members between generations. Ensuring the propagation of the elite members is

termed elitism and requires that not only is the elite members selected, but a copy of it

does not become disrupted by crossover or mutation (COLEY, 1998).

5.1.6 Termination

Like biological evolution that goes on and on, a genetic algorithm continues to run, evolv-

ing solutions until explicitly terminated. After each population is evaluated, the algorithm

checks a set of termination conditions. There are several common termination conditions

used in genetic algorithms (BANZHAF et al., 1998):

• Maximum generations - The genetic algorithm stops when the speciﬁed number

of generation’s have evolved;

• Elapsed time - The genetic process will end when a speciﬁed time has elapsed.

• No change in ﬁtness - The genetic process will end if there is no change to the

population’s best ﬁtness for a speciﬁed number of generations.

• Stall time limit - The algorithm stops if there is no improvement in the objective

function during an interval of time in seconds equal to stall time limit.

The termination conditions are often used in combination. A genetic algorithm can

be terminated after a certain number of generations, when a particular average ﬁtness has

been achieved, or when the ﬁtness function does not change after a speciﬁc number of

generations.

5.2 Proposed Algorithm

5.2.1 Individual Representation

The individual is represented by a sequence of processes for the execution. This sequence

is divided into blocks that have their size equals to the number of processors. Figure 5.7

is an example of an individual.

9 6

processors

processes indices

waiting time

execution step

Figure 5.7: An example of a scheduling individual.

A block represents the processes that will be executed in parallel, each process in a

processor. Each block also have an index that indicates an execution step (see Figure 5.7).

Note that, a block can contain less processes than its size. This occurs because of the

dependencies among the processes. In the example, processes 1, 4, 6 and 8 depends on

the ones indicated by 2 and 7. This lack of processes is called waiting time.

In our scheduling algorithm, the genetic operators aﬀect only the processes. They

modify where the process will be executed (in what processor), and at what execution

time it will be executed.

5.2.2 Selection and Elitism

In ourproblem, the individualsare selected through a ﬁtness-based process, the proportionate-

selection method. This method is used to select both individuals that will be aﬀected by

crossover operator and mutation operator. Each of these operators selects a proportion

of the population and the values of the percentages of this proportion will be given in

Chapter 7. Besides this, the elitism concept is used. The percentage of elitism considered

will also be given in Chapter 7.

5.2.3 Crossover

The genetic algorithm proposed herein applies a proposed crossover operator to generate

two new individuals into the new generation. It is a two-level crossover consisting of

getting genes of the parents and swapping repeated information. In getting genes stage,

the processors of the parents are divided into two partitions: even processors and odd

processors. Each partition is given to a new oﬀspring so that the ﬁrst new oﬀspring gets

the information of even processors of the ﬁrst parent and odd processors of the second

parent, and the second new oﬀspring gets the information of odd processors of the ﬁrst

parent and even processors of the second parent. This process is shown in Figure 5.8.

parents

offspring

9 6

7 8

Figure 5.8: First step of crossover individuals.

In swapping repeated information stage, repeated jobs associated to an oﬀspring are

searched, and the repeated jobs at the same position in the two new oﬀspring will be

considered, as shown in Figure 5.9. The two new oﬀspring swap these repeated jobs

between them (Figure 5.9 also shows the resulting of this swapping based on the result

obtained in Figure 5.8). This stage is important to increase the chance of an oﬀspring is

feasible.

offspring

new

offspring

9 8

7 8

7 6

9 6

Figure 5.9: Second step of crossover individuals.

Although this does not occur every time, when an oﬀspring is not feasible, we discard

it in order to have a new generation containing only feasible individuals. An example

that shows a crossover that generates unfeasible oﬀspring is depicted in Figure 5.10, that

shows the ﬁrst step of the crossover. The second step of the crossover is depicted in

Figure 5.11.

parents

offspring

9 4

2 3

Figure 5.10: First step of crossover individuals that generates unfeasible oﬀspring.

One can note that in Figure 5.11 it is possible to change processes 4 and 7. They

appear repeated in the ﬁrst and second oﬀspring respectively. But processes 1 and 2 that

also appear repeated in the ﬁrst and second oﬀspring respectively can not be changed

because they do not occupy the same position. So that the oﬀspring generated after the

second step of the crossover are not feasible.

offspring

new

offspring

2 3

9 4

Figure 5.11: Second step of crossover individuals that generates unfeasible oﬀspring.

5.2.4 Mutation

The goal of the uniform mutation is to exchange two neighboring genes without violating

precedence relationship in order to create an individual that could not have been produced

by the crossover operator (KIM, 2007). The uniform mutation was operated as follows:

for each individual selected for this operation, it is generated an integer random number

r and then swaps two random processes in the rth execution step. These operators guar-

antee the modiﬁcation in communication cost values without violating the precedence of

processes.

Figure 5.12 shows an example of this process. Here, we suppose that the integer

random number r selected is 3 so that the third execution step is the one which will be

mutated. After that, we suppose that we selected randomly the numbers 0 and 2 which

indicate that the processes that will change are the one in the third execution step of the

processor 0 and the one in the third execution step of the processor 2. These processes are

marked in Figure 5.12 that also shows the result after the mutation.

mutation

9 6

9 8

Figure 5.12: An example of the proposed mutation.

5.2.5 Fitness Function

Our scheduling algorithm wants to ﬁnd a trade oﬀ between the execution of processes and

the time necessary for these processes to communicate with each other. Besides that, we

want to minimize the idle time. In other words, our scheduling algorithm aims to optimize

three criteria together.

The ﬁrst criteria to be optimized is makespan (total ﬁnishing time), which means

the maximum execution time of the last job. The makespan function, represented by

max

= max{C

|i = 1, ..., n}, where C

is the ﬁnishing execution time of the job i. The

second criteria is the communication cost. Here, we represented this function by T, where

T =



, ∀i, j, assuming i, j two processes that need to communicate and t

the time

spent with the communication. The third criteria is waiting time, which corresponds to the

total idle time of the processors. The waiting time function is represented by



j=1

which means the sum of waiting time of each processor j.

Although in MPhyScaS problem the optimization criteria is the minimization of these

three functions, we put these criteria together in the following function

γ = ω

· C

max

+ ω

· T + ω



j=1

where ω

, ω

and ω

are weights for giving importance to each function.

Here, we present an example of an individual representation based on the proposed

model. This GA individual representation is showed in Figure 5.13.

alg0

ph0

qt0

qt1

gt0

gt1

Figure 5.13: An example of the proposed individual representation.

Chapter 6

Case Study

In this chapter we describe the case study used to evaluate the scheduling algorithm pro-

posed in this dissertation.

6.1 Basics

Based on the example showed as a GA individual in Figure 5.13 on Chapter 5 we will

explain how the elements will appear in the case study.

The Kernel is the ﬁrst element to be approached in the case study. It is important to

know that the Kernel requests a Block to execute an Algorithm. In order to make the

Kernel reusable, the algorithms used by the Kernel are enumerated such that these virtual

indexes are keys for a map (deﬁned in Deﬁnition 6.1.1) and the user must indicates the

actual algorithm for each virtual index.

Deﬁnition 6.1.1 A map is a function M : V → A, where V is the set of virtual indexes

and A is the set of actual indexes.

The map that does this association between virtual and actual algorithms for the Ker-

nel is called algorithm map. Based on our example, the way the Kernel does this request

to a Block is described as follows:

blockVector[0] → ExecuteAlgorithm(algorithmMap(0, 0))

And the algorithm map can be represented as Table 6.1. Here, blockVector[0] in-

dicates that our Block 0 is the required one. In algorithmMap(0, 0), the ﬁrst number

indicates the required Block and the second number indicates the virtual index of the al-

gorithm that will be executed. If we look at the Table 6.1, we will see that the actual

Algorithm associated to this virtual one is our Algorithm 0.

Table 6.1: Algorithm map of Kernel

Block Virtual Algorithm Actual Algorithm

0 0 0

The algorithm, similarly to the Kernel, asks a Group to compute a QuantityTask. The

QuantityTasks called by the Algorithm are enumerated such that these virtual indexes are

keys for a map (called task map) and the user must indicates the actual QuantityTask for

each virtual index. Based on our example, the way the Algorithm does two requests to

the Group (asking for the computation of two diﬀerent QuantityTasks) is described as

follows:

groupVector[0] → ComputeQuantityTask(taskMap(0, 0))

groupVector[0] → ComputeQuantityTask(taskMap(0, 1))

The task map can be represented as Table 6.2. Here, groupVector[0] indicates that

our Group 0 is required one. In taskMap(0, 0), the ﬁrst number indicates the required

Group and the second number indicates the virtual index of the QuantityTask that will be

computed. In this case, if we look at the Table 6.2, wewill see that theactual QuantityTask

associated to this virtual one is our QuantityTask 0. In the case of the second line, where

the map is used as taskMap(0, 1), looking to Table 6.2 we see that the QuantityTask

associated to this virtual one is our QuantityTask 1.

Table 6.2: Task map of Algorithm

Group Virtual QuantityTask Actual QuantityTask

0 0

1 1

The next element to be approached is the QuantityTask, which is diﬀerent from the

others here presented. The QuantityTask is a set of GroupTasks. Each GroupTask asks a

Phenomenon to execute a Quantity. Based on our example, the way the QuantityTasks,

GroupTasks, Phenomena and Quantites are represented is described as follows:

1. QuantityTask 0 - Description of QuantityTask 0

(a) GroupTask 0 - type: assembler; assembler structure: Global State 0 (global-

State0Name)

i. Phenomenon 0 (phenomenon0Name)

A. Quantity 0 - coupling: none; parameters: none

2. QuantityTask 1 - Description of QuantityTask 1

(a) GroupTask 1 - type: assembler; assembler structure: Global State 1 (global-

State1Name)

i. Phenomenon 0 (phenomenon0Name)

A. Quantity 1 - coupling: Phenomenon 0 (phenomenon0Name) Global

State 0 (globalState0Name); parameters: none

One can note that the QuantityTasks are deﬁned in the ﬁrst level. In the second level,

the GroupTasks of the related QuantityTask are deﬁned. In the third and fourth levels ap-

pear the Phenomenon and the Quantity required by the related GroupTask. The coupling

information of Quantity 1 is deﬁned in the information of the Quantity. It means that

Quantity 1 can only assemble the structure indicated by the GroupTask after Global State

0 had already been assembled.

There are other types of GroupTasks: solver, operator, neummanBC, and essentialBC.

For each type of GroupTask the information can change. In the case of neummanBC or

essentialBC, a Quantity is required to execute (similarly to the assembler type presented).

In the case of solver or operator, some additional information is informed, but a Quantity

is not required to execute.

6.2 Kernel

The kernel represents the overall scenery of the simulation. It is responsible for initializa-

tion of procedures, for global time loops and iterations, for global adaptive iterations, and

for articulations of activities to be executed by the blocks in the lower level. The Kernel

stores system data related to the parameters for its loops and iterations.

In kernel code it is necessary the usage of two maps: the algorithm map and the

system data map. The ﬁrst one associates diﬀerent keys to an algorithm. It is represented

by algorithmMap(x, y), where x identiﬁes the block that will execute the algorithm, and

y is a diﬀ erent key for each algorithm that will be executed in the kernel. The second

map associates diﬀerent keys to a single data that comes from the system data. It is

represented by systemDataMap(x), where x is a diﬀerent key for each data considered

from the system data.

6.2.1 System Data

The system data is a structure that is responsible for maintaining the data that aﬀects

directly only the system. Some examples of this data can be data related to time, time

step, some condition. The system data is also useful to keep the kernel and the algorithms

in touch.

6.3 Global States

As explained in Chapter 2, the global states are global data structures such as matrices,

vectors and scalars. They are stored in a group due to the fact that the group is responsible

not only for assembling and solving algebraic systems, but also for operating with the

structures when requested by its block.

6.4 Algorithms

The algorithms request groups to execute quantityTasks, which is the level immediately

after Group in MPhyScaS architecture. All algorithms must have a structure called algo-

rithm data where some information regarding the algorithm can be stored. Besides this,

all algorithms must use six maps in its code, as follows:

• Task map- associatesdiﬀerent keys to a quantityTask. Represented by taskMap(x, y),

x identiﬁes the group that will execute the quantityTask, and y is a diﬀerent key for

each quantityTask that will be executed in the algorithm;

• Statemap - associates diﬀerent keysto a global state. Represented by stateMap(x, y),

x identiﬁes the group that will retrieve or update the global state, and y is a diﬀ erent

key for each global state considered in the algorithm;

• System data map - associates diﬀerent keys to a single data that comes from the

system data. Represented by systemDataMap(x), x is a diﬀerent key for each data

considered from the system data;

• Block algorithm map - associates diﬀerent keys to another algorithms. Repre-

sented by blockAlgorithmMap(x), x is a diﬀerent key for each another algorithm

(referenced in the algorithm in execution) that will be executed by the same block

of the algorithm in execution;

• Algorithm data map - associates diﬀerent keys to a single data that comes from

the algorithm data of another algorithm. This map can only be nonempty if the

block algorithm map is also nonempty. Represented by algorithmDataMap(x, y),

x identiﬁes the other algorithm from which the algorithm data is considered (the

value here is the same key that appears in block algorithm map), and y is a diﬀerent

key for each data considered of the algorithm data.

These maps are responsible for representing a speciﬁc data without knowing its in-

stance. Because of this, it is easier to reuse these algorithms. An example of the us-

age of the task map can be seen in line 1 of Pseudo-Code 5. The actual identiﬁer of a

quantityTask is directly associated to the keys of the map. Therefore, in the example,

taskMap(0, 0) is associated to the QuantityTask 0 of Group 0 according to the task map

table. Some of following algorithms do not present all six maps, which means that the

maps not presented are empty.

6.5 QuantityTasks

Here, the quantityTasks of each group will be presented separately. They will be shown

in a four-level scheme. In the ﬁrst level is the quantityTask and an explanation of what it

will compute. In the second level are the groupTasks executed by the quantityTask. Note

that, a quantityTask can execute more than one groupTask, hence it is possible to ﬁnd

more than one item in the second level. In the third level is the phenomenon requested by

the groupTask. In this level, only one item can be found as sub item of a groupTask. And,

in the fourth level are the quantities executed by the phenomenon.

The groupTasks that will appear in the second level can be of four types. The ﬁrst

one is the assembler, this type of groupTask also presents the assembled structure. The

second type is the boundary condition which appears as neummanBC an essentialBC and

it also presents the assembled structure. The third type is the solver and the fourth type

is operator one. All solver and operator groupTasks invoke the group phenomenon to

execute their tasks, while the assembler and boundary condition groupTasks invoke other

phenomenon but the group phenomenon.

When the groupTask is an assembler or a boundary condition one, the quantity must

present information about coupled states and parameters that are necessary for its execu-

tion. When the groupTask is of solver type, the quantity must present three global states

that will compose the algebraic linear system and parameters that are necessary for its

execution. And when the groupTask is of operator type, the quantity must present the

type of the operation, the global states that will compose it and also the parameters of the

operation.

Chapter 7

Experiments and Results

This chapter presents the evaluation of the scheduling algorithm proposed here and a com-

parison to the algorithms showed in Chapter 3, reference algorithms of the related works.

For this purpose, we implemented the algorithms and generated two abstract simulators

that are DAGs of processes based on MPhyScaS architecture which was then scheduled

by these algorithms. Besides the abstract simulators, we also use the case study presented

in Chapter 6 to compare the algorithm proposed with the List Scheduling, LPT and SPT

algorithms.

For the experiments, we used combinations among 4 algorithms to be executed, 3

problems to solve (the DAG representing the processes and their dependencies), and 2

architectures where the simulations would run. This represents six scenarios (3x2 = 6

scenarios, for problems and architectures) and twenty four executions (6x4 = 24 execu-

tions, for scenarios and algorithms).

Two architectures were regarded by the algorithms: the ﬁrst one is composed of 3

homogeneous processors fully connected, while the second one is composed of 6 homo-

geneous processors that are also fully connected. This chapter presents the results for

each algorithm applied to the six scenarios.

For the three diﬀerent algorithms and the proposal genetic algorithm, we run the algo-

rithms 30 times. We calculate the mean and the standard deviation of these simulations.

In list scheduling algorithm we use the MPhyScaS architecture for determining the prior-

ity of the processes. Processes in the same level have the same priority. The Kernel level

has the highest priority and the Quantity level has the lowest priority.

We used the data of ten simulations to estimate the number of samples required, we

found that we need about nine samples to calculate the mean value for the ﬁtness func-

tion, assuming a conﬁdence interval of 95%. Based on this estimation, we collected the

arithmetic mean and the standard deviation of each scenario.

For sequential simulation of the problems, the communication cost and the waiting

time functions have their values nulled, and the makespan function has its maximum

value. The results for sequential execution will also be presented for each problem.

In the experiments, we use the values 0.6, 0.3, and 0.1 for the parameters ω

, ω

, and

respectively. These values represents what percentage of each function (makespan,

communication cost, and waiting time respectively) we wanted to take in consideration.

The makespan function is the most important for this work because the total execution

time is what we want to minimize. We also want to minimize the communication cost

and the waiting time, but they are less important than the makespan. The waiting time

function is less important to consider than the communication cost.

The population size is a critical parameter in genetic algorithms. The GA converges to

poor solutions if the population is too small. The GA spends unnecessary computational

resources is the population is excessively large (LOBO; LIMA, 2005). For the proposed

genetic algorithm, the simulations were done for population size equals to 50 and 100 in-

dividuals (based on (MOATTAR; RAHMANI; DERAKHSHI, 2007; KIM, 2007)), using

cross probability equals to 0.9 and the mutation rate equals to 0.1.

7.1 The First Abstract Simulator

The ﬁrst problem has 50 nodes (processes) and 61 edges (dependencies). These depen-

dencies were found by analising the Petri net model that corresponds to this simulator. A

maximum of 8 levels of nesting is found in dependencies of this graph.

The value obtained using the ﬁtness function for the sequential execution of the ﬁrst

problem is equal to 1446.6. For the architecture with 3 homogeneous processors, the re-

sults to the list scheduling, LPT and SPT rules are presented in Table 7.1. And Table 7.2

shows the results found by the proposed genetic algorithm for 500, 1000 and 2000 iter-

ations. In both Tables 7.1 and 7.2, the percentage gain obtained based on the sequential

execution is shown for all algorithms. Speciﬁcally for the GA the percentage gain is

calculated to the result obtained after 2000 iterations.

Table 7.1: Results for list scheduling, LPT and SPT using the ﬁrst architecture

List Scheduling LPT SPT

Mean 969. 9200 1045.5100 1017.1700

S.D. 36.7211 37.2302 40.0182

Gain (%) 32.9517 27.7264 29.6854

Table 7.2: Results for GA using the ﬁrst architecture

Fitness

Iterations 50 individuals 100 individuals

500

Mean 842.2633 833.4733

S.D. 31.0516 21.0644

1000

Mean 839.9333 829.4833

S.D. 29.8499 24.1853

2000

Mean 837.4533 828.8533

S.D. 29.3756 24.0546

Gain (%) 42.1088 42.7033

The results in bold font are the best mean ﬁtness values for each iteration considered.

Note that the improvement observed between 1000 and 2000 iterations in both population

size is very small, so that the computation spent for this is not necessary in this example.

The improvement of the proposed algorithm can be monitored in Figure 7.1.

For the architecture with 6 homogeneous processors, the results to the list scheduling,

LPT and SPT rules are presented in Table 7.3. And Table 7.4 shows the results found by

the proposed genetic algorithm for 500, 1000 and 2000 iterations.

Table 7.3: Results for list scheduling, LPT and SPT using the second architecture

List Scheduling LPT SPT

Mean 1066.7400 1104.3200 1132.8300

S.D. 114.8803 120.1449 122.5676

Gain (%) 26.2588 23.6610 21.6902

820

840

860

880

900

920

940

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Fitness value

Iterations

50 individuals

100 individuals

Figure 7.1: The convergence of the proposed genetic algorithm for 50 and 100 individuals

using ﬁrst architecture.

Table 7.4: Results for GA using the second architecture

Fitness

Iterations 50 individuals 100 individuals

500

Mean 707.2600 697.2100

S.D. 37.0818 36.6040

1000

Mean 689.2900 686.1900

S.D. 38.0706 35.5946

2000

Mean 682.0100 680.0800

S.D. 36.3665 32.2251

Gain (%) 52.8543 52.9877

The improvement of the proposed algorithm is depicted in Figure 7.2.

The data presented shows that the population with 100 individuals presents better

schedules than the one with 50 individuals during all iterations. Considering the ﬁrst

architecture, at the 500th iteration the 100-individual population had already presented

better schedules than the 50-individual population at the end of 2000 iterations. Note that

in both situation the 100-individual population converged faster than the 50-individual

population.

700

750

800

850

900

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Fitness value

Iterations

50 individuals

100 individuals

Figure 7.2: The convergence of the proposed genetic algorithm for 50 and 100 individuals

using second architecture.

7.2 The Second Abstract Simulator

The second problem has 126 nodes (processes) and 223 edges (dependencies), which

were found by the Petri net model. This graph has 16 levels of nesting as its maximum.

One can note that this implies the second problem is more diﬃcult to solve than the ﬁrst

one.

The value obtained using the ﬁtness function for the sequential execution of the sec-

ond problem is equal to 3510.0. For the architecture with 3 homogeneous processors,

the results to the list scheduling, LPT and SPT rules are presented in Table 7.5. And

Table 7.6 shows the results found by the proposed genetic algorithm for 500, 1000 and

2000 iterations. In both Tables 7.5 and 7.6, the percentage gain obtained based on the

sequential execution is shown for all algorithms. Speciﬁcally for the GA the percentage

gain is calculated to the result obtained after 2000 iterations.

Table 7.5: Results for list scheduling, LPT and SPT using the ﬁrst architecture

List Scheduling LPT SPT

Mean 2592.1100 2580.9700 2618.6800

S.D. 157.4361 154.3924 159.9462

Gain (%) 26.1507 26.4681 25.3937

Table 7.6: Results for GA using the ﬁrst architecture

Fitness

Iterations 50 individuals 100 individuals

500

Mean 2447.6730 2445.8930

S.D. 62.3827 61.3071

1000

Mean 2399.9230 2401.9930

S.D. 59.5612 74.9665

2000

Mean 2379.2830 2385.7330

S.D. 63.3114 70.7054

Gain (%) 32.2141 32.0304

The convergence during all iterations of the latest genetic algorithm results presented

can be seen in Figure 7.3.

2350

2400

2450

2500

2550

2600

2650

2700

2750

2800

2850

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Fitness value

Iterations

50 individuals

100 individuals

Figure 7.3: The convergence of the proposed genetic algorithm for 50 and 100 individuals

using the ﬁrst architecture.

For the second architecture (with 6 homogeneous processors), the results to the list

scheduling, LPT and SPT rules are presented in Table 7.7 and Table 7.8 shows the results

found by the proposed genetic algorithm for 500, 1000 and 2000 iterations.

The convergence during all iterations of the latest genetic algorithm results presented

can be seen in Figure 7.4.

One can note that this second problem is more diﬃcult to solve than the previous one

Table 7.7: Results for list scheduling, LPT and SPT using the second architecture

List Scheduling LPT SPT

Mean 2944.8700 2817.8200 2842.7300

S.D. 158.5764 155.0172 160.9108

Gain (%) 16.1006 19.7202 19.0105

Table 7.8: Results for GA using the second architecture

Fitness

Iterations 50 individuals 100 individuals

500

Mean 2254.2800 2242.7870

S.D. 74.0134 62.4255

1000

Mean 2149.6600 2140.3770

S.D. 66.9043 65.4855

2000

Mean 2077.6200 2068.6470

S.D. 67.2494 59.3870

Gain (%) 40.8085 41.0642

so that both population size behaved almost in the same way. For this problem, the fact

of having more individuals did not contribute to the population with 100 individuals due

to the complexity of the problem.

7.3 Case Study

The case study has 150 nodes (processes) and 237 edges (dependencies). These depen-

dencies were found by analising the Petri net model that corresponds to this problem. A

maximum of 12 levels of nesting is found in dependencies of this graph.

The value obtained using the ﬁtness function for the sequential execution of the sec-

ond problem is equal to 5680.8. For the architecture with 3 homogeneous processors,

the results to the list scheduling, LPT and SPT rules are presented in Table 7.9. And

Table 7.10 shows the results found by the proposed genetic algorithm for 500, 1000 and

2000 iterations. In both Tables 7.9 and 7.10, the percentage gain obtained based on the

2000

2100

2200

2300

2400

2500

2600

2700

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Fitness value

Iterations

50 individuals

100 individuals

Figure 7.4: The convergence of the proposed genetic algorithm for 50 and 100 individuals

using the second architecture.

sequential execution is shown for all algorithms. Speciﬁcally for the GA the percentage

gain is calculated to the result obtained after 2000 iterations.

Table 7.9: Results for list scheduling, LPT and SPT using the ﬁrst architecture

List Scheduling LPT SPT

Mean 5464.2300 5356.8000 5495.4900

S.D. 256.9011 255.3976 257.7374

Gain (%) 3.8123 5.7034 3.2620

The convergence during all iterations of the latest genetic algorithm results presented

can be seen in Figure 7.5.

For the second architecture (with 6 homogeneous processors), the results to the list

scheduling, LPT and SPT rules are presented in Table 7.11 and Table 7.12 shows the

results found by the proposed genetic algorithm for 500, 1000 and 2000 iterations.

The improvement of the proposed algorithm is depicted in Figure 7.6.

Considering the ﬁrst architecture, at the 500th iteration the 100-individual population

had already presented better schedules than the 50-individual population at the end of

2000 iterations. Note that in both situation the 100-individual population converged faster

than the 50-individual population.

Looking at the results for the second architecture, one can note that only GA algorithm

Table 7.10: Results for GA using the ﬁrst architecture

Fitness

Iterations 50 individuals 100 individuals

500

Mean 4925.2800 4838.9400

S.D. 101.4804 111.0909

1000

Mean 4877.0000 4796.0000

S.D. 96.3841 117.8795

2000

Mean 4852.7400 4772.9200

S.D. 102.7384 111.5300

Gain (%) 14.5765 15.9816

Table 7.11: Results for list scheduling, LPT and SPT using the second architecture

List Scheduling LPT SPT

Mean 6727.8600 6727.8000 6982.8900

S.D. 259.8337 258.9764 260.4492

Gain (%) −18.4315 −18.4305 −22.9209

Table 7.12: Results for GA using the second architecture

Fitness

Iterations 50 individuals 100 individuals

500

Mean 5350.2400 5294.6800

S.D. 147.6926 107.5116

1000

Mean 5269.4200 5221.0000

S.D. 148.4255 107.9098

2000

Mean 5232.9200 5181.1800

S.D. 154.3999 106.3545

Gain (%) 7.8841 8.7949

gets some improvement. The other algorithms do not get improvement because of the

communication cost and waiting time which spent more time than the saved one.

4750

4800

4850

4900

4950

5000

5050

5100

5150

5200

5250

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Fitness value

Iterations

50 individuals

100 individuals

Figure 7.5: The convergence of the proposed genetic algorithm for 50 and 100 individuals

using the ﬁrst architecture.

5100

5200

5300

5400

5500

5600

5700

5800

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Fitness value

Iterations

50 individuals

100 individuals

Figure 7.6: The convergence of the proposed genetic algorithm for 50 and 100 individuals

using the second architecture.

Chapter 8

Conclusions and Future Works

8.1 Conclusions

This dissertation explores an important factor related to job parallelization: to identify

parallel jobs and scheduling them. Scheduling algorithms explore the job dependencies

in order to deﬁne a schedule near to the optimal one. These algorithms consider only the

makespan (total completion time) function to obtain the schedule. This process can be

very complex when considering other eﬀects: the architecture where the simulations will

run; the communication cost between jobs; and the waiting time of each processor.

Considering other eﬀects leads us to create a complex environment composed of great

quantity of variablesthat will be considered. This makes the simple scheduling algorithms

obsolete. In these circumstances, the algorithm proposed in this dissertation was modeled

to be capable of incorporating these eﬀects and of deﬁning a schedule based on the costs

associated to these eﬀects.

The need for identifying the job dependencies, eventually generated a new problem.

A technique based on Coloured Petri Nets (CPN) was used for searching and identifying

the dependencies between jobs. CPN model is used to represent the MPhyScaS simulator

and simulation data. The model deﬁnition involves the representation of the MPhyScaS

architecture objects and their relationship. The model is used to capture the causal and

data dependencies between processes of a MPhyScaS simulator. A compiler was devel-

oped to transform the simulator and simulation data into a CP-net that follows the deﬁned

model.

The model deﬁnition allowed us to go through all reachable states of the CP-net gen-

erated. We used the CPNTools to create the model, generate the reachable state graph and

to analyze this graph to recognize dependencies between processes. These dependencies

will be used by the scheduling algorithms as one of their parameters.

For the scheduling problem, initially it was implemented three diﬀerent scheduling

algorithms: List Scheduling Algorithm; Longest Processing Time; and Shortest Process-

ing Time. These algorithms take in consideration only the makespan function and job

dependencies. The ﬁrst algorithm prioritize the jobs with highest priority which are given

by the user to determine the schedule. The second one prioritize the jobs with longest

processing time, while the last one, prioritize the jobs with shortest processing time.

Dealing with MPhyScaS problems, other parameters are relevant: the architecture

where the simulations will run; and the communication cost between jobs due to the

large amount of data communicated. Although we could adapted any of the developed

algorithms, we prefer to use another technique. The proposed work suggests that complex

problems involving a lot of parameters (eﬀects) considered in the problem can be solved

using Computational Intelligence (CI) techniques. The study and improvement of these

techniques demonstrated that the biggest challenge is not in the problem itself but in

the art of using these techniques. The biggest challenge is in the capacity of visualizing

what each technique has to oﬀer and use this to contribute in problem solution. Hence, we

decided to develop the Genetic Algorithm (GA) technique. Applying a Genetic Algorithm

in scheduling problems with a well deﬁned hierarchy demonstrates to be very interesting

and valuable.

Due to the complexity of MPhyScaS problem, the usage of serching algorithms as GA

is essential to ﬁnd a good job schedule to be used. Hence, for each new approach (new

eﬀects to be considered) a new search for a schedule near the optimal should be done.

Though, the algorithm is able to adapt and solve the problem with the same eﬃciency,

even if there are changes in architecture where the simulation will run or if there are new

eﬀects to be considered.

Applying GA to the MPhyScaSproblem, we obtained relevantresults in relation to the

other algorithms implemented. In the application of the GA to the ﬁrst problem generated,

the results obtained are better than the other three algorithms, even when we decrease the

population size of GA from 100 to 50 individuals. When it was applied to the second

problem generated, the GA obtain equivalent results to the diﬀerent sizes of population

(100 and 50 individuals), but better than the other algorithms. To the case study, GA also

obtained better results. The behavior was almost identical for the GA when applied to

diﬀerent sizes of population in the second problem. This was expected due to the fact that

the second problem has a lot of dependencies. This causes the GA does not have many

option of schedule to search.

The fact that the GA be able to incorporate and consider several eﬀects while perform-

ing the job scheduling, makes this technique useful for MPhyScaS. The main reason of

this is the little time spent by the GA to ﬁnd a schedule (about 2 minutes for the presented

problems) in comparison to the time gain in the simulation. Thus, one can conclude that

the proposed work results in a signiﬁcant contribution in the areas of system modeling,

parallel processing and scheduling. It contributes also to the area of muti-physics simula-

tion.

8.2 Future Works

If we analyze other works that use Genetic Algorithm, we will note that the way GA

parameters are used has important impact for the algorithm performance. For each GA

iteration, the population passes through selection, part of the population passes through

the crossover task, and another part of population passes through the mutation task. In

the proposed GA algorithm, a study to optimize the parameters related to these tasks

can be performed, thus shortening the time needed to converge to a satisfactory solution

(scheduling).

The parameters used to balance the importance of the eﬀects in the ﬁtness function

may also be optimized. These parameters indicate how much each eﬀect will contribute to

the solution. The optimization of these parametersmay providebetter results. Other Com-

putational Intelligence techniques should be studied to determine the technique which is

the most appropriate for this.

The proposed GA algorithm demonstrates to be able to incorporate several other ef-

fects that inﬂuences in searching for a better schedule. Therefore new parameters (eﬀects)

should be incorporated in order to contribute to a better schedule choice. Hence, new ex-

periments involving other eﬀects should be done in order to, analyzing the results, the

quality of the solutions found can be improved.

Other simulation settings involving the MPhyScaS problem may be experimented in

order to analyze and extend the application of the GA algorithm. Some simulations can

be done to verify the inﬂuence of the variables according to the problems.

Another important point that should be studied is how to measure the cost of each

MPhyScaS job. In case study presented, the measurements of the computation costs used

are done related to some evidences in the simulation, as the mesh size, the nodal di-

mension of the phenomenon. A study should be done in order to obtain more realistic

measures.

APPENDIX A

Case Study

A.1 Kernel Algorithm

The pseudo-code of the kernel used in the case study is presented in Pseudo-Code 4. Lines

1 and 2 of Pseudo-Code 4 initialize Block 0 and Block 1 respectively. In line 3, the kernel

initialize time data, computing the initial time step of the simulation.

From line 6 to line 14 the iteration itself occurs until the interval time of the simulation

is reached. Basically, this kernel executes an algorithm, veriﬁes the condition returned,

update time data and store the results of the iteration in order to be post-processed. The

execution of the algorithm occurs in line 7, where the algorithm that does the U iteration

is called. Lines 8 to 11 is the veriﬁcation of the condition returned by the algorithm

executed. In line 12 an algorithm to update the time data is executed. And then, the

results obtained in the iteration are stored so that the post-processing phenomenon can

present them to the user. The algorithm map and the system data map related to this

kernel are shown in Table A.1 and Table A.2 respectively.

Pseudo-Code 4: Kernel algorithm pseudo-code.

block Vector [0] → Execute Algorithm(algorithm Map(0,0)) //Initialize1

Block 0

block Vector [1] → Execute Algorithm(algorithm Map(1,0)) //Initialize2

Block 1

block Vector [0] → Execute Algorithm(algorithm Map(0,1)) //Initialize3

time data. It calls an algorithm for initial time step

computation

double tn = system Data → Get Data(system Data Map(0))4

double interval = system Data → Get Data(system Data Map(1))5

while tn ≤ interval do6

block Vector [0] → Execute Algorithm(algorithm Map(0,2))7

//Iteration of U

int condition = system Data → Get Data(system Data Map(2))8

//Iteration condition

if condition < 0 then9

return condition10

end11

block Vector [0] → Execute Algorithm(algorithm Map(0,3)) //Update12

time data. It calls an algorithm for time step computation

block Vector [1] → Execute Algorithm(algorithm Map(1,1)) //Store13

results for post-processing phenomenon

end14

A.2 System Data

The system data of this case study are shown in Table A.3.

A.3 Global States

The global states of this case study are presented in Table A.4 and they are separated by

groups.

Table A.1: Algorithm map of Kernel algorithm

Block Algorithm Actual Algorithm

0 0

1 7

2 2

3 8

0 5

1 6

Table A.2: System data map of Kernel algorithm

Index Data

0 1 (tn)

1 3 (interval)

2 4 (condition)

Table A.3: System Data

Index Data

0 dt

1 tn

2 tnp1

3 interval

4 condition

A.4 Algorithms

A.4.1 Algorithm 0

Algorithm 0 initializes two global states: Un and Cn. Hence, it computes two quanti-

tyTasks. The ﬁrst one is executed by Group 0 and initializes Un as shown in line 1 of

Pseudo-Code 5. The second quantityTask is executed by Group 1 and initializes Cn as

shownin line 2 of Pseudo-Code 5. The task map of Algorithm 0 is presented in Table A.5.

Table A.4: Global States for each Group

Group Code Global State

0 Un

1 Unp1

2 Uk

3 Ukp1

4 dU

5 errU

6 gSystemData0

7 trSig

8 K

9 F

20 trS

22 aux1

10 Cn

11 Cnp1

12 Ck

13 Ckp1

14 dC

15 errC

16 gSystemData1

17 M

18 G

19 Kc

21 aux0

23 aux2

24 aux3

2 25 gSystemData2

Pseudo-Code 5: Algorithm 0 pseudo-code.

group Vector [0] → Compute QuantityTask(task Map(0,0)) //Initialize1

group Vector [1] → Compute QuantityTask(task Map(1,0)) //Initialize2

Table A.5: Task map of Algorithm 0

Group QuantityTask Actual QuantityTask

0 0 0

1 0 0

A.4.2 Algorithm 1

Algorithm 1 computes initial time step and update system group’s time data in global

state gS ystemData2. Lines 1 to 3 of Pseudo-Code 6 computes minimum time step at

each group. In this case, these groups are Group 0 and Group 1. In line 4 of the same

algorithm, a quantityTask is executed to compute the minimum time step among these

groups. After that, the system data is updated with the value of the time step stored in

global state gSystemData2, as can be seen in lines 5 and 6 of Pseudo-Code 6.

The task map, state map and system data map of Algorithm 1 are presented in Ta-

ble A.6, Table A.7 and Table A.8 respectively.

Pseudo-Code 6: Algorithm 1 pseudo-code.

for int i = 0; i < 2; i + + do1

grpVec[i] → Compute QuantityTask(task Map(i,0)) //Compute minimum2

time step at each group (Group 0, Group 1)

end3

group Vector [2] → Compute QuantityTask(task Map(2,0)) //Compute4

minimum time step among Groups 0, 1. There is no modification

on time step data for other groups but the SystemGroup

double[] time Data = group Vector [2] → Retrieve State(state Map(2,0))5

//Retrieve time data vector

system Data → Update(system Data Map(0),time Data [0]) //Updates time6

step in System Data

A.4.3 Algorithm 2

Algorithm 2 represents U iteration. Lines 1 and 2 of Pseudo-Code 7 initialize vector

ﬁelds for U iteration and Picard iteration respectively. In line 5, Algorithm 2 informs to

Table A.6: Task map of Algorithm 1

Group QuantityTask Actual QuantityTask

0 0 1

1 0 1

2 0 0

Table A.7: State map of Algorithm 1

Group State Actual State

2 0 25 (gSystemData2)

Table A.8: System data map of Algorithm 1

Index Data

0 0 (dt)

the algorithm it executes not to initialize its vector ﬁelds. This is because Algorithm 2 has

already initialize them (line 2).

From line 1 to line 15 of Pseudo-Code 8 the iteration itself occurs until errU satisﬁes

the tolerance or the number of iterations reaches the maximum value permitted. Lines 2

to 4 solves the algebric linear system, where line 2 assembles the matrix and the strength

vector, line 3 introduces boundary conditions and line 4 solves the system, computing

Ukp1. The trace of the strain tensor is computed in line 5. Algorithm 2 executes another

algorithm in line 6. If this other algorithm returns a value less than zero, then Algorithm

2 returns this value (line 9) because it is an error. If the returned value does not represent

an error, Algorithm 2 computes errU in order to know if the iteration will continue.

When the iteration ﬁnishes, Algorithm 2 veriﬁes if the number of executed iterations

is the maximum. If it occurs, the algorithm did not converge so that an error is returned

(lines 17 and 18). If it does not occur, the algorithm converged successfully. In lines 21

and 22 Unp1 and Cnp1 are updated for the next execution.

The task map, state map, system data map, algorithm data, block algorithm map and

algorithm data map of Algorithm 2 are presented in Table A.9, Table A.10, Table A.11,

Table A.12, Table A.13 and Table A.14 respectively.

Pseudo-Code 7: Algorithm 2 pseudo-code (ﬁrst part).

group Vector [0] → Compute QuantityTask(task Map(0,0)) //Initialize1

vector fields for iteration of U. Uk = Un

group Vector [1] → Compute QuantityTask(task Map(1,0)) //Initialize2

vector fields for piccard iteration

double tnp1 = system Data → Get Data(system Data Map(0))3

double errU = 14

block Algorithm Map(0) → Set Algorithm Data(algorithm Data Map(0,0),0)5

//Not perform initialization

double tolU = algorithm Data [0]6

int maximum Iterations = algorithm Data [1]7

int iteration = 08

Pseudo-Code 8: Algorithm 2 pseudo-code (second part).

while errU > tolU && iteration < maximum Iterations do1

group Vector [0] → Compute QuantityTask(task Map(0,1)) //Assembles2

matrix and strength vector. F needs Ckp1, tnp1.

group Vector [0] → Compute QuantityTask(task Map(0,2))3

//Introduces boundary conditions

group Vector [0] → Compute QuantityTask(task Map(0,3)) // Solves4

the system. Computes Ukp1

group Vector [0] → Compute QuantityTask(task Map(0,4)) // Computes5

tr(σ)

this Block → Execute Algorithm(block Algorithm Map(0)) //Piccard6

iteration. Computes Ckp1

int cond = block Algorithm Map(0) → Get Algorithm Data(algorithm Data7

Map(0,1)) //Retrieves piccard result

if cond < 0 then8

return cond9

end10

group Vector [0] → Compute QuantityTask(task Map(0,5)) //Computes11

errU. errU = Ukp1 − Uk/Ukp1

errU = group Vector [0] → Retrieve State(state Map(0,0))12

group Vector [0] → Compute QuantityTask(task Map(0,6)) //Uk = Ukp113

iteration ++14

end15

if iteration ≥ maximum Iterations then16

algorithm Data [2] = failure Codes [0]17

return 018

end19

algorithm Data [2] = success Codes [0]20

group Vector [0] → Compute QuantityTask(task Map(0,7)) //Unp1 = Uk21

group Vector [1] → Compute QuantityTask(task Map(1,1)) //Cnp1 = Ck22

return 123

Table A.9: Task map of Algorithm 2

Group QuantityTask Actual QuantityTask

0 2

1 3

2 4

3 5

4 6

5 7

6 10

7 11

0 2

1 10

Table A.10: State map of Algorithm 2

Group State Actual State

0 5 (errU)

1 2 (Uk)

2 3 (Ukp1)

3 1 (Unp1)

0 11 (Cnp1)

1 12 (Ck)

Table A.11: System data map of Algorithm 2

Index Data

0 2 (tnp1)

A.4.4 Algorithm 3

Algorithm 3 represents Picard iteration. Lines 1 to 3 of Pseudo-Code 9 initialize vector

ﬁelds if it was set. In this case, the vector ﬁelds are not initialized because they were

initialized by Algorithm 2.

From line 11 to line 21 the iteration itself occurs until errC satisﬁes the tolerance

Table A.12: Algorithm data of Algorithm 2

Index Data

0 tolU

1 maximum iterations

2 failure/success code

Table A.13: Block algorithm map of Algorithm 2

Index Algorithm

0 3

Table A.14: Algorithm data map of Algorithm 2

Algorithm Index Actual Index

0 3 (initialization)

1 2 (condition)

or the number of iterations reaches the maximum value permitted. Line 12 shows the

quantityTask that will compute de trace of S. Lines 13 to 15 solves the algebric linear

system, where line 13 assembles the matrix and the strength vector, line 14 introduces

boundary conditions and line 15 solves the system, computing Ckp1. Lines 16 and 17

prepare data for errC computation (line 18) in order to know if the iteration will continue.

When the iteration ﬁnishes, Algorithm 3 veriﬁes if the number of executed iterations is

the maximum. If it occurs, the algorithm did not converge so that an erroris returned (lines

2 and 3 of Pseudo-Code 10). If it does not occur, the algorithm converged successfully

(lines 6 and 7). In line 5 Cnp1 are updated for the next execution.

The task map, state map, system data map and algorithm data of Algorithm 3 are

presented in Table A.15, Table A.16, Table A.17 and Table A.18 respectively.

100

Pseudo-Code 9: Algorithm 3 pseudo-code.

if algorithm Data [3] then1

groupVector [1] → Compute QuantityTask(task Map(1,0)) //Ck = Ckp12

end3

double dt = system Data → Get Data(system Data Map(0))4

double tnp1 = system Data → Get Data(system Data Map(1))5

double tn = tnp1 − dt6

double tolC = algorithm Data [0]7

int maximum Iterations = algorithm Data [1]8

double errC = 19

int iteration = 010

while errC > tolC && iteration < maximum Iterations do11

groupVector [0] → Compute QuantityTask(task Map(0,0)) //Computes12

tr(S). trS needs trSig, Ck and the parameters C0, e

groupVector [1] → Compute QuantityTask(task Map(1,1)) //Assembles13

matrix and strength vector. M needs trSig, Ck, dt. G needs

dt, Cn

groupVector [1] → Compute QuantityTask(task Map(1,2)) //Introduces14

boundary conditions

groupVector [1] → Compute QuantityTask(task Map(1,3)) //Solves the15

system. Computes Ckp1

groupVector [1] → Compute QuantityTask(task Map(1,4)) //Prepare16

data for residue computation. RC = M · Ckp1 − G

groupVector [1] → Compute QuantityTask(task Map(1,2)) //Introduces17

boundary conditions

groupVector [1] → Compute QuantityTask(task Map(1,5)) //Computes18

errC. errC = max{RC, Ckp1 − Ck}/Ckp1

groupVector [1] → Compute QuantityTask(task Map(1,0)) //Ck = Ckp119

iteration ++20

end21

101

Pseudo-Code 10: Algorithm 3 pseudo-code.

if iteration ≥ maximum Iterations then1

algorithm Data [2] = failure Codes [0]2

return 03

end4

groupVector [1] → Compute QuantityTask(task Map(1,6)) //Cnp1 = Ckp15

algorithm Data [2] = success Codes [0]6

return 17

Table A.15: Task map of Algorithm 3

Group QuantityTask Actual QuantityTask

0 0 12

0 11

1 3

2 4

3 5

4 6

5 7

6 10

Table A.16: State map of Algorithm 3

Group State Actual State

0 12 (Ck)

1 13 (Ckp1)

2 11 (Cnp1)

A.4.5 Algorithm 4

Algorithm 4 computes minimum time step among the groups for each iteration loop and

updates the group system’s time data in its gSystemData. The pseudo-code of this algo-

rithm is shown in Pseudo-Code 11.

102

Table A.17: System data map of Algorithm 3

Index Data

0 0 (dt)

1 2 (tnp1)

Table A.18: Algorithm data of Algorithm 3

Index Data

0 tolC

1 maximum iterations

2 failure/success code

3 initialization

In lines 1 to 3 each group computes its time step. In line 4, the system group (Group

2 for this case study) computes the minimum time step among all other groups. One can

note that this task does not modify the time step data for other groups but the system

group. The time step value is updated in the system data (line 6).

The task map, state map and system data map of Algorithm 4 are presented in Ta-

ble A.19, Table A.20 and Table A.21 respectively.

Pseudo-Code 11: Algorithm 4 pseudo-code.

for int i = 0; i < 2; i + + do1

group Vector [i] → Compute QuantityTask(task Map(i,0)) //Computes2

minimum time step at each group (Group 0, Group 1)

end3

group Vector [2] → Compute QuantityTask(task Map(2,0)) // Computes4

minimum time step among Groups 0, 1. There is no modification

on time step data for other groups but SystemGroup

double[] time Data = group Vector [2] → Retrieve State(state Map(2,0))5

//Retrieves time data vector

system Data → Update(system Data Map(0),time Data [0]) //Updates time6

step in System Data

103

Table A.19: Task map of Algorithm 4

Group QuantityTask Actual QuantityTask

0 0 8

1 0 8

2 0 0

Table A.20: State map of Algorithm 4

Group State Actual State

2 0 25 (gSystemData2)

Table A.21: System data map of Algorithm 4

Index Data

0 0 (dt)

A.4.6 Algorithm 5

Algorithm 5 retrieves initial data from the simulation in order to be post-processed and

presented to the user. Algorithm 5 (pseudo-code presented in Pseudo-Code 12) consists

in the execution of only one QuantityTask (line 1 of Pseudo-Code 12). This algorithm has

only the task map which is presented in Table A.22.

Pseudo-Code 12: Algorithm 5 pseudo-code.

group Vector [2] → Compute QuantityTask(task Map(2,0)) //Retrieves1

initial data from the simulation

Table A.22: Task map of Algorithm 5

Group QuantityTask Actual QuantityTask

2 0 0

104

A.4.7 Algorithm 6

Algorithm 6 retrieves data during the simulation in order to be post-processed and pre-

sented to the user. Because of this algorithm the user can monitor the evolution of sim-

ulation data. Algorithm 6 (pseudo-code presented in Pseudo-Code 13) consists in the

execution of only one QuantityTask (line 1 of Pseudo-Code 13). This algorithm has only

the task map which is presented in Table A.23.

Pseudo-Code 13: Algorithm 6 pseudo-code.

group Vector [2] → Compute QuantityTask(task Map(2,0)) //Retrieves1

data during the simulation

Table A.23: Task map of Algorithm 6

Group QuantityTask Actual QuantityTask

2 0 1

A.4.8 Algorithm 7

Algorithm 7 initializes time data in each group using data from system group. First,

the system group (Group 2) initializes time data. In other words, it set the values of tn,

tnp1 and dt equals to zero (line 1 of Pseudo-Code 14). Then, line 2 shows that another

algorithm is called in order to obtain the minimum time step amongthe groups. All groups

but the system group updates its time data with the value obtained by the other algorithm

(lines 3 to 5) and the system data is updated with the same value (ines 7 to 9).

The task map, state map, system data map and block algorithm map of Algorithm 7

are presented in Table A.24, Table A.25, Table A.26 and Table A.27 respectively.

105

Pseudo-Code 14: Algorithm 7 pseudo-code.

group Vector [2] → Compute QuantityTask(task Map(2,0)) //Initialize1

all time data. dt = tn = tnp1 = 0

this Block → Execute Algorithm(block Algorithm Map(0)) //Calls an2

algorithm for initial time step computation

for int i = 0; i < 2; i + + do3

group Vector [i] → Compute QuantityTask(task Map(i,0)) //Updates4

time data for Groups 0, 1 from SystemGroup

end5

double[] time Data = group Vector [2] → Retrieve State(state Map(2,0))6

system Data → Update(system Data Map(0),time Data [0]) //dt7

system Data → Update(system Data Map(1),time Data [1]) //tn8

system Data → Update(system Data Map(2),time Data [2]) //tnp19

Table A.24: Task map of Algorithm 7

Group QuantityTask Actual QuantityTask

0 0 9

1 0 9

2 0 2

Table A.25: State map of Algorithm 7

Group State Actual State

2 0 25 (gSystemData2)

Table A.26: System data map of Algorithm 7

Index Data

0 0 (dt)

1 1 (tn)

2 2 (tnp1)

106

Table A.27: Block algorithm map of Algorithm 7

Index Algorithm

0 1

A.4.9 Algorithm 8

Algorithm 8 updates time data in each group using data from system group. First, another

algorithm is called in order to obtain the minimum time step among the groups (line 1 of

Pseudo-Code 15). Then, line 2 shows that the system group (Group 2) updates its time

data. All groups but the system group updates its time data with the value of system

group’s time data (lines 3 to 5) and the system data is updated with the same value (ines

7 to 9).

The task map, state map, system data map and block algorithm map of Algorithm 8

are presented in Table A.28, Table A.29, Table A.30 and Table A.31 respectively.

Pseudo-Code 15: Algorithm 8 pseudo-code.

this Block → Execute Algorithm(block Algorithm Map(0)) //Calls an1

algorithm for time step computation. Final time step is in

SystemGroup only

group Vector [2] → Compute QuantityTask(task Map(2,0)) //Updates time2

data for SystemGroup

for int i = 0; i < 2; i + + do3

group Vector [i] → Compute QuantityTask(task Map(i,0)) //Updates4

time data for Groups 0, 1 from SystemGroup

end5

double[] time Data = group Vector [2] → Retrieve State(state Map(2,0))6

system Data → Update(system Data Map(0),time Data [0]) //dt7

system Data → Update(system Data Map(1),time Data [1]) //tn8

system Data → Update(system Data Map(2),time Data [2]) //tnp19

107

Table A.28: Task map of Algorithm 8

Group QuantityTask Actual QuantityTask

0 0 9

1 0 9

2 0 1

Table A.29: State map of Algorithm 8

Group State Actual State

2 0 25 (gSystemData2)

Table A.30: System data map of Algorithm 8

Index Data

0 0 (dt)

1 1 (tn)

2 2 (tnp1)

Table A.31: Block algorithm map of Algorithm 8

Index Algorithm

0 4

A.5 QuantityTasks

A.5.1 QuantityTasks for Group 0

1. QuantityTask 0 - Initialize Un

(a) GroupTask 0 - type: assembler; assembler structure: Global State 0 (Un)

i. Phenomenon 1 (Elasticity)

A. Quantity 0 - coupling: none; parameters: none

2. QuantityTask 1 - Computes intial dt of this group

(a) GroupTask 1 - type: assembler; assembler structure: Global State 6 (gSys-

108

temData0)

i. Phenomenon 1 (Elasticity)

A. Quantity 1 - coupling: none; parameters: none

3. QuantityTask 2 - Initialize Uk for global solution iteration

(a) GroupTask 2 - type: operator

i. Phenomenon 0 (groupPhenomenon0)

A. Data - operation type: swap; operation structures: Global State 2

(Uk), Global State 0 (Un); parameters: none

4. QuantityTask 3 - Assembles matrix K and strength vector F for solving Ukp1

(a) GroupTask 3 - type: assembler; assembler structure: Global State 8 (K)

i. Phenomenon 1 (Elasticity)

A. Quantity 2 - coupling: none; parameters: none

(b) GroupTask 4 - type: assembler; assembler structure: Global State 9 (F)

i. Phenomenon 1 (Elasticity)

A. Quantity 3 - coupling: Phenomenon 0 (groupPhenomenon0) Global

State 6 (gSystemData0); parameters: none

B. Quantity 4 - coupling: Phenomenon 4 (Diﬀusion) Global State 12

(Ck); parameters: none

5. QuantityTask 4 - Introduces boundary conditions, modifying K and F

(a) GroupTask 5 - type: neummanBC; assembler structure: Global State 9 (F)

i. Phenomenon 1 (Elasticity)

A. Quantity5.a - coupling: Phenomenon 0 (groupPhenomenon0) Global

State 6 (gSystemData0); parameters: Neumman condition = 0 (com-

plete Neumman condition)

B. Quantity5.b - coupling: Phenomenon 0 (groupPhenomenon0)Global

State 6(gSystemData0); parameters: Neumman condition = 1(Neum-

man condition in tangential direction)

109

C. Quantity5.c - coupling: Phenomenon 0 (groupPhenomenon0) Global

State 6(gSystemData0); parameters: Neumman condition = 2(Neum-

man condition in normal direction)

(b) GroupTask 6 - type: essentialBC; assembler structure: Global State 8 (K),

Global State 9 (F)

i. Phenomenon 1 (Elasticity)

A. Quantity6.a - coupling: Phenomenon 0 (groupPhenomenon0) Global

State 6 (gSystemData0); parameters: Dirichlet condition = 0 (com-

plete Dirichlet condition)

B. Quantity6.b - coupling: Phenomenon 0 (groupPhenomenon0)Global

State 6 (gSystemData0); parameters: Dirichlet condition = 1 (Dirich-

let condition in tangential direction)

C. Quantity6.c - coupling: Phenomenon 0 (groupPhenomenon0) Global

State 6 (gSystemData0); parameters: Dirichlet condition = 2 (Dirich-

let condition in normal direction)

6. QuantityTask 5 - Computes Ukp1 using matrix K and strength vector F

(a) GroupTask 7 - type: solver

i. Phenomenon 0 (groupPhenomenon0)

A. Data - operation type: solver; operation structures: Global State 8

(K), Global State 9 (F), Global State 3 (Ukp1); parameters: depends

on the solver

7. QuantityTask 6 - Computes trSig (trace of the strain tensor)

(a) GroupTask 8 - type: assembler; assembler structure: Global State 7 (trSig)

i. Phenomenon 1 (Elasticity)

A. Quantity 7 - coupling: Phenomenon 1 (Elasticity) Global State 3

(Ukp1); parameters: none

8. QuantityTask 7 - Computes errU

(a) GroupTask 9 - type: operator

110

i. Phenomenon 0 (groupPhenomenon0)

A. Data - operation type: subtract; operation structures: Global State 2

(Uk), Global State 3 (Ukp1), Global State 2 (Uk); parameters: none

(b) GroupTask 10 - type: operator

i. Phenomenon 0 (groupPhenomenon0)

A. Data - operation type: vecNorm2; operation structures: Global State

5 (errU), Global State 2 (Uk); parameters: none

i. Phenomenon 0 (groupPhenomenon0)

A. Data - operation type: vecNorm2; operation structures: Global State

22 (aux1), Global State 3 (Ukp1); parameters: none

(d) GroupTask 12 - type: operator

i. Phenomenon 0 (groupPhenomenon0)

A. Data - operation type: scalarDivide; operation structures: Global

State 5 (errU), Global State 5 (errU), Global State 22 (aux1); pa-

rameters: none

9. QuantityTask 8 - Computes dt of this group after the beginning

(a) GroupTask 13 - type: assembler; assembler structure: Global State 6 (gSys-

temData0)

i. Phenomenon 1 (Elasticity)

A. Quantity 8 - coupling: none; parameters: none

10. QuantityTask 9 - Updates gSystemData0

(a) GroupTask 14 - type: assembler; assembler structure: Global State 6 (gSys-

temData0)

i. Phenomenon 1 (Elasticity)

A. Quantity 9 - coupling: Phenomenon 4 (groupPhenomenon2) Global

State 25 (gSystemData2); parameters: none

11. QuantityTask 10 - Updates Uk value for the next iteration

111

(a) GroupTask 15 - type: operator

i. Phenomenon 0 (groupPhenomenon0)

A. Data - operation type: swap; operation structures: Global State 2

(Uk), Global State 3 (Ukp1); parameters: none

12. QuantityTask 11 - Updates Unp1 value for the next iteration

(a) GroupTask 16 - type: operator

i. Phenomenon 0 (groupPhenomenon0)

A. Data - operation type: swap; operation structures: Global State 1

(Unp1), Global State 2 (Uk); parameters: none

13. QuantityTask 12 - Computes trS

(a) GroupTask 17 - type: assembler; assembler structure: Global State 20 (trS)

i. Phenomenon 1 (Elasticity)

A. Quantity 10 - coupling: Phenomenon 1 (Elasticity) Global State 7

(trSig); parameters: none

A.5.2 QuantityTasks for Group 1

1. QuantityTask 0 - Initialize Cn

(a) GroupTask 0 - type: assembler; assembler structure: Global State 10 (Cn)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 0 - coupling: none; parameters: none

2. QuantityTask 1 - Computes initial dt of this group

(a) GroupTask 1 - type: assembler; assembler structure: Global State 16 (gSys-

temData1)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 1 - coupling: none; parameters: none

3. QuantityTask 2 - Initialize Ck

(a) GroupTask 2 - type: operator

112

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: swap; operation structures: Global State 12

(Ck), Global State 10 (Cn); parameters: none

4. QuantityTask 3 - Assembles matrix M and strength vector G for solving Ckp1

(a) GroupTask 3 - type: assembler; assembler structure: Global State 17 (M)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 2 - coupling: none; parameters: none

(b) GroupTask 4 - type: assembler; assembler structure: Global State 20 (trS)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 3 - coupling: Phenomenon 1 (Elasticity) Global State 7

(trSig), Phenomenon 3 (Diﬀusion) Global State 12 (Ck); parameters:

none

i. Phenomenon 3 (Diﬀusion)

A. Quantity 4 - coupling: none; parameters: none

B. Quantity 5 - coupling: Phenomenon 3 (Diﬀusion) Global State 20

(trS); parameters: none

(d) GroupTask 6 - type: assembler; assembler structure: Global State 18 (G)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 6 - coupling: Phenomenon 2 (groupPhenomenon1) Global

State 16 (gSystemData1); parameters: none

(e) GroupTask 7 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: matVecMult; operation structures: Global

State 21 (aux0), Global State 17 (M), Global State 10 (Cn); parame-

ters: multiplication factor = Global State 16 (gSystemData1), inverse

= true

(f) GroupTask 8 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

113

A. Data - operation type: addVec; operation structures: Global State 18

(G), Global State 18 (G), Global State 21 (aux0); parameters: none

(g) GroupTask 9 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: matLinearComb; operation structures: Global

State 17 (M), Global State 17 (M), Global State 19 (Kc); parameters:

ﬁrst parameter = Global State 16 (gSystemData1), inverse = true,

second parameter = 1, inverse = false

5. QuantityTask 4 - Introduces boundary conditions, modifying M and G

(a) GroupTask 10 - type: neummanBC; assembler structure: Global State 18 (G)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 7 - coupling: Phenomenon 2 (groupPhenomenon1) Global

State 16 (gSystemData1); parameters: none

(b) GroupTask 11 - type: essentialBC; assembler structure: Global State 17 (M),

Global State 18 (G)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 8 - coupling: Phenomenon 2 (groupPhenomenon1) Global

State 16 (gSystemData1); parameters: none

6. QuantityTask 5 - Computes Ckp1 using matrix M and strength vector G

(a) GroupTask 12 - type: solver

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: solver; operation structures: Global State 17

(M), Global State 18 (G), Global State 13 (Ckp1); parameters: de-

pends on the solver

7. QuantityTask 6 - Prepares data for residue computation

(a) GroupTask 3 - type: assembler; assembler structure: Global State 17 (M)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 2 - coupling: none; parameters: none

114

(b) GroupTask 13 - type: assembler; assembler structure: Global State 20 (trS)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 9 - coupling: Phenomenon 1 (Elasticity) Global State 7 (tr-

Sig), Phenomenon 3 (Diﬀusion) Global State 13 (Ckp1); parameters:

none

i. Phenomenon 3 (Diﬀusion)

A. Quantity 4 - coupling: none; parameters: none

B. Quantity 5 - coupling: Phenomenon 3 (Diﬀusion) Global State 20

(trS); parameters: none

(d) GroupTask 6 - type: assembler; assembler structure: Global State 18 (G)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 6 - coupling: Phenomenon 2 (groupPhenomenon1) Global

State 16 (gSystemData1); parameters: none

(e) GroupTask 7 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: matVecMult; operation structures: Global

State 21 (aux0), Global State 17 (M), Global State 10 (Cn); parame-

ters: multiplication factor: Global State 16 (gSystemData1), inverse

= true

(f) GroupTask 8 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: addVec; operation structures: Global State 18

(G), Global State 18 (G), Global State 21 (aux0); parameters: none

(g) GroupTask 9 - type: operator

i. Phenomenon 2 (grouPhenomenon1)

A. Data - operation type: matLinearComb; operation structures: Global

State 17 (M), Global State 17 (M), Global State 19 (Kc); parameters:

ﬁrst parameter = Global State 16 (gSystemData1), inverse = true,

second parameter = 1, inverse = false

115

8. QuantityTask 4 - Introduces boundary conditions, modifying M and G

(a) GroupTask 10 - type: neummanBC; assembler structure: Global State 18 (G)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 7 - coupling: Phenomenon 2 (groupPhenomenon1) Global

State 16 (gSystemData1); parameters: none

(b) GroupTask 11 - type: essentialBC; assembler structure: Global State 17 (M),

Global State 18 (G)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 8 - coupling: Phenomenon 2 (groupPhenomenon1) Global

State 16 (gSystemData1); parameters: none

9. QuantityTask 7 - Computes errC

(a) GroupTask 14 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: matVecMult; operation structures: Global

State 21 (aux0), Global State 17 (M), Global State 13 (Ckp1); pa-

rameters: none

(b) GroupTask 15 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: vecSubtract; operation structures: Global

State 21 (aux0), Global State 21 (aux0), Global State 18 (G); pa-

rameters: none

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: vecNorm2; operation structures: Global State

24 (aux3), Global State 21 (aux0); parameters: none

(d) GroupTask 17 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

116

A. Data - operation type: subtract; operation structures: Global State

12 (Ck), Global State 13 (Ckp1), Global State 12 (Ck); parameters:

none

(e) GroupTask 18 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: vecNorm2; operation structures: Global State

15 (errC), Global State 12 (Ck); parameters: none

(f) GroupTask 19 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: vecNorm2; operation structures: Global State

23 (aux2), Global State 13 (Ckp1); parameters: none

(g) GroupTask 20 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: scalarMax; operation structures: Global State

15 (errC), Global State 15 (errC), Global State 24 (aux3); parameters:

multiplication factor = Global State 23 (aux2), inverse = true

10. QuantityTask 8 - Computes dt of this group after the beginning

(a) GroupTask 21 - type: assembler; assembler structure: Global State 16 (gSys-

temData1)

i. Phenomenon 3 (Diﬀusion)

A. Quantity 10 - coupling: none; parameters: none

11. QuantityTask 9 - Updates gSystemData1

(a) GroupTask 22 - type: assembler; assembler structure: Global State 16 (gSys-

temData1)

i. Phenomenon 3 (Diﬀusion)

A. Quantity11 - coupling: Phenomenon 4 (groupPhenomenon2) Global

State 25 (gSystemData2); parameters: none

12. QuantityTask 10 - Updates Cnp1 value for the next iteration

117

(a) GroupTask 23 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: swap; operation structures: Global State 11

(Cnp1), Global State 12 (Ck); parameters: none

13. QuantityTask 11 - Updates Ck value for the next iteration

(a) GroupTask 24 - type: operator

i. Phenomenon 2 (groupPhenomenon1)

A. Data - operation type: swap; operation structures: Global State 12

(Ck), Global State 13 (Ckp1); parameters: none

A.5.3 QuantityTasks for Group 2

1. QuantityTask 0 - Computes minimum time step

(a) GroupTask 0 - type: assembler; assembler structure: Global State 25 (gSys-

temData2)

i. Phenomenon 4 (groupPhenomenon2)

A. Quantity 0 - coupling: Phenomenon 0 (groupPhenomenon0) Global

State 6(gSystemData0), Phenomenon 2 (groupPhenomenon1) Global

State 16 (gSystemData1); parameters: none

2. QuantityTask 1 - Updates time data for next time step computation

(a) GroupTask 1 - type: assembler; assembler structure: Global State 25 (gSys-

temData2)

i. Phenomenon 4 (groupPhenomenon2)

A. Quantity 1 - coupling: none; parameters: none

3. QuantityTask 2 - Initialize time data

(a) GroupTask 2 - type: assembler; assembler structure: Global State 25 (gSys-

temData2)

i. Phenomenon 4 (groupPhenomenon2)

A. Quantity 2 - coupling: none; parameters: none

118

A.5.4 QuantityTasks for Group 3

1. QuantityTask 0 - Retrieves initial data from the simulation

(a) GroupTask 0 - type: posProc

i. Phenomenon 6 (PostProcessing)

A. Quantity 0 - coupling: Phenomenon 1 (Elasticity); parameters: none

B. Quantity 1 - coupling: Phenomenon 3 (Diﬀusion); parameters: none

C. Quantity 2- coupling: Phenomenon 4(groupPhenomenon2); param-

eters: none

2. QuantityTask 1 - Retrieves data during the simulation

(a) GroupTask 0 - type: posProc

i. Phenomenon 6 (PostProcessing)

A. Quantity 0 - coupling: Phenomenon 1 (Elasticity); parameters: none

B. Quantity 1 - coupling: Phenomenon 3 (Diﬀusion); parameters: none

C. Quantity 2- coupling: Phenomenon 4(groupPhenomenon2); param-

eters: none

119

References

AGHA, G. A.; DE CINDIO, F.; ROZENBERG, G. Concurrent object-oriented pro-

gramming and petri nets: advances in petri nets. Secaucus, NJ, USA: Springer-Verlag

New York, Inc., 2001.

AIDA, K.; CASANOVA, H. Scheduling mixed-parallel applications with advance reser-

vations. In: HPDC ’08: PROCEEDINGS OF THE 17TH INTERNATIONAL SYMPO-

SIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 2008, New York,

NY, USA. Anais... ACM, 2008. p.65–74.

ALTENDORFER, K.; KABELKA, B.; STOCHER, W. A new dispatching rule for

optimizing machine utilization at a semiconductor test ﬁeld. Advanced Semiconduc-

tor Manufacturing Conference, 2007. ASMC 2007. IEEE/SEMI, [S.l.], p.188–193,

June 2007.

BAGCHI, T.P. Multiobjective Scheduling by Genetic Algorithms.Norwell, MA, USA:

Kluwer Academic Publishers, 1999.

BANZHAF, W.; NORDIN, P.; KELLER, R.; FRANCONE, F. Genetic Programming -

An Introduction. San Francisco, CA: Morgan Kaufmann, 1998.

BARESEL, A.; BINKLEY, D.; HARMAN, M.; KOREL, B. Evolutionary testing in the

presence of loop-assigned ﬂags: a testability transformation approach. SIGSOFT Softw.

Eng. Notes, New York, NY, USA, v.29, n.4, p.108–118, 2004.

BARESEL, A.; STHAMER, H.; SCHMIDT, M. Fitness Function Design To Improve

Evolutionary Structural Testing. In: GECCO ’02: PROCEEDINGS OF THE GENETIC

AND EVOLUTIONARY COMPUTATION CONFERENCE, 2002, San Francisco, CA,

USA. Anais... Morgan Kaufmann Publishers Inc., 2002. p.1329–1336.

120

BEYER, H.-G. The theory of evolution strategies. New York, NY, USA: Springer-

Verlag New York, Inc., 2001.

BLAZEWICZ, J.; ECKER, K.; PESCH, E.; SCHMIDT, G.; WEGLARZ, J. Handbook

on Scheduling: models and methods for advanced planning (international handbooks on

information systems). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2007.

BOBBIO, A. Computational restrictions for SPN with generally distributed transition

times. In: FIRST EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC-

1), LECTURE NOTES IN COMPUTER SCIENCE, 1994. Anais... Springer-Verlag,

1994. p.131–148.

BRUCKER, P. Scheduling Algorithms. Secaucus, NJ, USA: Springer-Verlag New York,

Inc., 2006.

CHAJED, D.; LOWE, T. J. Building Intuition: insights from basic operations manage-

ment models and principles. [S.l.]: Springer, 2008.

CHEN, W.-H.; LIN, C.-S. A hybrid heuristic to solve a task allocation problem. Comput.

Oper. Res., Oxford, UK, UK, v.27, n.3, p.287–303, 2000.

CHONG, C. S.; SIVAKUMAR, A. I.; LOW, M. Y. H.; GAY, K. L. A bee colony op-

timization algorithm to job shop scheduling. In: WSC ’06: PROCEEDINGS OF THE

38TH CONFERENCE ON WINTER SIMULATION, 2006. Anais... Winter Simulation

Conference, 2006. p.1954–1961.

COLEY, D. A. An Introduction to Genetic Algorithms for Scientists and Engineers.

River Edge, NJ, USA: World Scientiﬁc Publishing Co., Inc., 1998.

CPNGROUP. CPNTools: computer tool for coloured petri nets. Url:

http://wiki.daimi.au.dk/cpntools/cpntools.wiki.

DARTE, A.; ROBERT, Y.; VIVIEN, F. Scheduling and Automatic Parallelization.

[S.l.]: Birkhauser Boston, 2000.

DORIGO, M.; STüTZLE, T. Ant Colony Optimization. Cambridge, Massachussets:

The MIT Press, 2004.

121

ENGELBRECHT, A. P. Computational Intelligence: an introduction. NJ: Wiley Pub-

lishing, 2007.

FUKUZAWA, K.; SAEKI, M. Evaluating software architectures by coloured petri nets.

In: SEKE ’02: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON

SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2002, New York,

NY, USA. Anais... ACM, 2002. p.263–270.

HE, X.; LEE, J. A. N. A methodology for constructing predicate transition net speciﬁca-

tions. Softw. Pract. Exper., New York, NY, USA, v.21, n.8, p.845–875, 1991.

HEGAZ, T.; KASSAB, M. Resource Optimization Using Combined Simulation and Ge-

netic Algorithms. In: JOURNAL OF CONSTRUCTION ENGINEERING AND MAN-

AGEMENT, 2003. Anais... ASCE, 2003. p.698–705.

HOLLAND, J. H. Adaptation in natural and artiﬁcial systems. Ann Arbor, USA: Uni-

versity of Michigan Press, 1975.

HOLLAND, J. H. Adaptation in natural and artiﬁcial systems. Cambridge, MA, USA:

MIT Press, 1992.

JENSEN, K. Coloured Petri Nets: a high level language for system design and analysis.

In: APN 90: PROCEEDINGS ON ADVANCES IN PETRI NETS 1990, 1991, New York,

NY, USA. Anais... Springer-Verlag New York: Inc., 1991. p.342–416.

JENSEN, K. Coloured Petri Nets: basic concepts, analysis methods and practical use

(monographs in theoretical computer science a series of eatcs). [S.l.]: Springer, 1997.

JENSEN, K.; ROZENBERG, G. (Ed.). High-level Petri nets: theory and application.

London, UK: Springer-Verlag, 1991.

JORDAN, H. F.; ALAGHBAND, G. Fundamentals of Parallel Processing. [S.l.]: Pren-

tice Hall Professional Technical Reference, 2003.

KANG, Q.; HE, H.; WANG, H.; JIANG, C. A Novel Discrete Particle Swarm Optimiza-

tion Algorithm for Job Scheduling in Grids. In: ICNC ’08: PROCEEDINGS OF THE

2008 FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION,

2008, Washington, DC, USA. Anais... IEEE Computer Society, 2008. p.401–405.

122

KASHAN, A. H.; KARIMI, B.; JENABI, M. A hybrid genetic heuristic for scheduling

parallel batch processing machines with arbitrary job sizes. Comput. Oper. Res., Oxford,

UK, UK, v.35, n.4, p.1084–1098, 2008.

KIM, J.-L. Permutation-based elitist genetic algorithm using serial scheme for large-sized

resource-constrained project scheduling. In: WSC ’07: PROCEEDINGS OF THE 39TH

CONFERENCE ON WINTER SIMULATION, 2007, Piscataway, NJ, USA. Anais...

IEEE Press, 2007. p.2112–2118.

KOLEN, A. W. J.; KAN, A. H. G. R.; HOESEL, C. P. M. van; WAGELMANS, A. P. M.

Sensitivity analysis of list scheduling heuristics. Discrete Appl. Math., Amsterdam, The

Netherlands, The Netherlands, v.55, n.2, p.145–162, 1994.

KWOK, Y. K.; AHMAD, I. Benchmarking the Task Graph Scheduling Algorithms. In:

IPPS ’98: PROCEEDINGS OF THE 12TH. INTERNATIONAL PARALLEL PROCESS-

ING SYMPOSIUM ON INTERNATIONALPARALLEL PROCESSING SYMPOSIUM,

1998, Washington, DC, USA. Anais... IEEE Computer Society, 1998. p.531.

LENCASTRE, M. Conceptualization of an Environment for the Development of

FEM Simulators. 2004. Thesis in Computer Science — Federal University of Pernam-

buco, Recife, Pernambuco.

LEUNG, J.; KELLY, L.; ANDERSON, J.H. Handbook ofScheduling: algorithms, mod-

els, and performance analysis. Boca Raton, FL, USA: CRC Press, Inc., 2004.

LOBO, F. G.; LIMA, C. F. A review of adaptive population sizing schemes in genetic

algorithms. In: GECCO ’05: PROCEEDINGS OF THE 2005 WORKSHOPS ON GE-

NETIC AND EVOLUTIONARY COMPUTATION, 2005, New York, NY, USA. Anais...

ACM, 2005. p.228–234.

LOGAN, D. L. A First Course in the Finite Element Method. 3rd.ed. Paciﬁc Grove,

CA, USA: Brooks/Cole Publishing Co., 2002.

MAN, K. F.; TANG, K. S.; KWONG, S. Genetic Algorithms: concepts and designs with

disk. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 1999.

123

MARSAN, M. A.; BALBO, G.; CONTE, G.; DONATELLI, S.; FRANCESCHINIS, G.

Modelling with Generalized Stochastic Petri Nets. SIGMETRICS Perform. Eval. Rev.,

New York, NY, USA, v.26, n.2, p.2, 1998.

MICHELI, G. D. High-Level Synthesis of Digital Circuits. Advances in Computers,

[S.l.], v.37, p.207–283, 1993.

MOATTAR, E.; RAHMANI, A.; DERAKHSHI, M. Job Scheduling in Multi Processor

Architecture Using Genetic Algorithm. Innovations in Information Technology, 2007.

Innovations ’07. 4th International Conference on, [S.l.], p.248–251, Nov. 2007.

MONTANA, D.; BRINN, M.; MOORE, S.; BIDWELL, G. Genetic algorithms for com-

plex, real-time scheduling. In: SYSTEMS, MAN, AND CYBERNETICS, 1998. 1998

IEEE INTERNATIONAL CONFERENCE ON, 1998. Anais... [S.l.: s.n.], 1998. v.3,

p.2213–2218 vol.3.

MURATA, T. Petri nets: properties, analysis and applications. Proceedings of the IEEE,

[S.l.], v.77, n.4, p.541–580, Apr 1989.

PETRI, C. A. Kommunikation mit Automaten. 1962. Tese (Doutorado em Ciência da

Computação) — Bonn: Institut für Instrumentelle Mathematik, Schriften des IIM Nr. 2.

Second Edition:, New York: Griﬃss Air Force Base, Technical Report RADC-TR-65–

377, Vol.1, 1966, Pages: Suppl. 1, English translation.

PINEDO, M. L. Scheduling: theory, algorithms, and systems. [S.l.]: Springer Publishing

Company, Incorporated, 2008.

RAMASWAMY, S.; SAPATNEKAR, S.; BANERJEE, P. A Framework for Exploiting

Task and Data Parallelism on Distributed Memory Multicomputers. IEEE Trans. Paral-

lel Distrib. Syst., Piscataway, NJ, USA, v.8, n.11, p.1098–1116, 1997.

REEVES, C.; YAMADA, T. Genetic Algorithms, Path Relinking and the Flowshop Se-

quencing Problem. Evolutionary Computation, [S.l.], v.6, p.45–60, 1998.

REISIG, W. Petri nets: anintroduction.NewYork, NY,USA: Springer-Verlag NewYork,

Inc., 1985.

REZENDE, S. O. Sistemas Inteligentes: fundamentos e aplicações. [S.l.]: Manole, 2002.

124

ROUBTSOVA, E. Property speciﬁcation for coloured Petri nets. In: SYSTEMS, MAN

AND CYBERNETICS, 2004 IEEE INTERNATIONAL CONFERENCE ON, 2004.

Anais... [S.l.: s.n.], 2004. v.3, p.2617–2622 vol.3.

SANTOS, F. C. G.; BARBOSA, J. M. A.; BEZERRA, J. M.; BRITO, E. R. R. J. An

Architecture for the Automatic Development of High Performance Multi-Physics Sim-

ulators. Fourth International Conference on Advanced Computational Methods in

Engineering (ACOMEN 2008), Liège, Belgium, 2008.

SANTOS, F. C. G.; BRITO, E. R. R. J.; BARBOSA, J. M. A. Dealing with coupled

phenomena in the ﬁnite element method. XXVII Latin American Congress on Compu-

tational Methods in Engineering, Belém, Brasil, 2006.

SANTOS, F. C. G.; BRITO, E. R. R. J.; BARBOSA, J. M. A. Phenomenon computa-

tional pattern: coupling relationship between phenomena on multi-physics simulation.

Industrial Simulation Conference 2006, Palermo, Italy, 2006.

SANTOS, F. C. G.; BRITO, E. R. R. J.; BARBOSA, J. M. A. Phenomenon Computational

Coupling relationship between phenomena on multi-physics simulation. 20th European

Conference on Modeling and Simulation, Bonn, Germany, 2006.

SANTOS, F. C. G.; BRITO, E. R. R. J.; BARBOSA, J. M. A. Coping with data depen-

dence and sharing in the simulation of coupled phenomena. International Congress on

Computational and Applied Mathematics - ICCAM, Leuven, Belgium, 2006.

SANTOS, F. C. G.; VIEIRA, M.; LENCASTRE, M. Workﬂow for Simulators Based

on Finite Element Method. In: INTERNATIONAL CONFERENCE ON COMPUTA-

TIONAL SCIENCE, 2003. Anais... [S.l.: s.n.], 2003. v.2658/2003.

SIMION, B.; LEORDEANU, C.; POP, F.; CRISTEA, V. A Hybrid Algorithm for

Scheduling Workﬂow Applications in Grid Environments (ICPDP). In: OTM CONFER-

ENCES (2), 2007. Anais... [S.l.: s.n.], 2007. p.1331–1348.

SINNEN, O.; SOUSA, L. Comparison of Contention Aware List Scheduling Heuristics

for Cluster Computing. In: ICPPW ’01: PROCEEDINGS OF THE 2001 INTERNA-

TIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2001, Wash-

ington, DC, USA. Anais... IEEE Computer Society, 2001. p.382.

125

SYWERDA, G. Uniform crossoverin genetic algorithms. In: GENETIC ALGORITHMS,

1989, San Francisco, CA, USA. Proceedings... Morgan Kaufmann Publishers Inc.,

1989. p.2–9.

TSUTSUMI, R.; FUJIMOTO, Y. Sensitivity Analysis of Critical Path and Its Visualiza-

tion in Job Shop Scheduling. In: KNOWLEDGE AND SKILL CHAINS IN ENGINEER-

ING AND MANUFACTURING INFORMATION INFRASTRUCTURE IN THE ERA

OF GLOBAL COMMUNICATIONS, 2005. Anais... Springer Boston, 2005. p.313–320.

(IFIP International Federation for Information Processing, v.167/2005).

WANG, J. Timed Petri Nets: theory and applications. Norwell, MA: Kluwer Academic

Publisher, 1998. v.9.

WANG, L.; WU, H.; TANG, F.; ZHENG, D.-Z. A Hybrid Quantum-Inspired Genetic

Algorithm for Flow Shop Scheduling. In: ADVANCES IN INTELLIGENT COMPUT-

ING, 2005. Anais... Springer Berlin / Heidelberg, 2005. p.636–644. (Lecture Notes in

Computer Science, v.3645/2005).

XU, J.; SUN, H.; YANG, W. Heuristic Algorithm for Fixed Job Scheduling Problem.

In: ICNC ’07: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE

ON NATURAL COMPUTATION (ICNC 2007), 2007, Washington, DC, USA. Anais...

IEEE Computer Society, 2007. p.698–701.

YANG, P.; WONG, C.; MARCHAL, P.; CATTHOOR, F.; DESMET, D.; VERKEST, D.;

LAUWEREINS, R. Energy-Aware Runtime Scheduling for Embedded-Multiprocessor

SOCs. IEEE Des. Test, Los Alamitos, CA, USA, v.18, n.5, p.46–58, 2001.

ZALZALA, A. M.; FLEMING, P. J. (Ed.).Genetic Algorithmsin Engineering Systems.

Stevenage, UK, UK: Institution of Electrical Engineers, 1997.

Livros Grátis
( http://www.livrosgratis.com.br )
 
Milhares de Livros para Download:
 
Baixar livros de Administração
Baixar livros de Agronomia
Baixar livros de Arquitetura
Baixar livros de Artes
Baixar livros de Astronomia
Baixar livros de Biologia Geral
Baixar livros de Ciência da Computação
Baixar livros de Ciência da Informação
Baixar livros de Ciência Política
Baixar livros de Ciências da Saúde
Baixar livros de Comunicação
Baixar livros do Conselho Nacional de Educação - CNE
Baixar livros de Defesa civil
Baixar livros de Direito
Baixar livros de Direitos humanos
Baixar livros de Economia
Baixar livros de Economia Doméstica
Baixar livros de Educação
Baixar livros de Educação - Trânsito
Baixar livros de Educação Física
Baixar livros de Engenharia Aeroespacial
Baixar livros de Farmácia
Baixar livros de Filosofia
Baixar livros de Física
Baixar livros de Geociências
Baixar livros de Geografia
Baixar livros de História
Baixar livros de Línguas

Baixar livros de Literatura
Baixar livros de Literatura de Cordel
Baixar livros de Literatura Infantil
Baixar livros de Matemática
Baixar livros de Medicina
Baixar livros de Medicina Veterinária
Baixar livros de Meio Ambiente
Baixar livros de Meteorologia
Baixar Monografias e TCC
Baixar livros Multidisciplinar
Baixar livros de Música
Baixar livros de Psicologia
Baixar livros de Química
Baixar livros de Saúde Coletiva
Baixar livros de Serviço Social
Baixar livros de Sociologia
Baixar livros de Teologia
Baixar livros de Trabalho
Baixar livros de Turismo