Distributed Workflows in Bioinformatics - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

Figure 23.9 b shows the workflow as comprised in Wildfire . The parallel “foreach”

construct, pforeach has been used to execute in parallel. This is because the number of

clusters and the number of motifs are not known a priori. This is a frequent occurence

in many bioinformatics workflows and hence this particular example is a good illus-

tration of the pforeach construct. This construct allows the pipeline included in the

pforeach box to be executed as many times as there are files in the directory that match

the particular “glob” pattern.

23.7.3

Parameter Estimation Using Swarm Intelligence

The next example demonstrates how to run a swarm intelligence algorithm over the

grid.

Real-life optimization problems are often intractable and heuristics are the only

choice for finding near optimal solutions. Particle Swarm Optimization [19] is such a

heuristic based on simulation of information exchange between leaders and followers

observed in, for example, bird flocking.

The algorithm simulates individuals flying through the search space. On each

iteration, the individuals are separated into a set of leaders and a set of followers,

based on their fitness. The followers use the locations of the leaders to change their

flying direction, that is, search velocity. The location of each individual is computed

based on its current location and flying direction. The new location is used to rank

the fitness of individuals and subsequently the leader and follower sets. This process

is repeated again until an optimal solution is found. In the swarm algorithm, each

individual of the swarm works independently after obtaining information about the

leaders. Hence, it is computationally advantageous to parallelize the algorithm.

The workflow in Figure 23.10 is a simplified implementation of a swarm algorithm

by Ray et al. [50]. The algorithm is applied to a parameter estimation problem for a

biochemical pathway model consisting of 36 unknowns and eight ODEs [45]. Com-

ponents Initialize , Evaluate , and Collate are used to initialize and rank

the individuals. Component Test determines whether the workflow should terminate

and Extract collects together the results on termination of the simulation. Compo-

nent ReEval is used to evaluate the fitness of an individual; note the outer parallel

loop evaluates the fitness of each follower. The remaining components are used to

select the leaders and followers.

It is to be noted that there is a cyclic dependency in the workflow as Classify

depends on both Collate and Reassign (initially and thereafter on Reassign

hereditarily), whereas Reassign in its turn is dependent hereditarily on Classify .

The while loop in GEL allows such dependencies and so is crucial for this workflow.

Figure 23.10 b shows the swarm workflow as comprised in Wildfire . Figure 23.11

shows the GEL script generated upon building the workflow in Wildfire .

The swarm algorithm script in Figures 23.10 b and 23.11 shows two new syntax

constructs not observed in the previous two examples. These two are the pfor and

while loops. The pfor initializes the swarm with some random values. The while

body contains the stages in the evolution of the swarm and it is the test for ending the

algorithm.The pforeach in the loop body represents the set of followers updating

Parallel Computing for Bioinformatics and Computational Biology

Search WWH ::

Custom Search

Home