Scientific Process Automation and Workflow Management - Scientific Data Management

Database Reference

In-Depth Information

abstract tasks can be matched dynamically to Condor resources. The matching

is done by the Condor matchmaker, which matches the requirements of an

abstract task specified in a Condor classAd * with the resource preferences

published in their classAds. We also note that Pegasus uses DAGMan as an

execution engine (Figure 13.3). Currently, Pegasus and DAGMan are being

integrated into a system, Pegasus-WMS, which provides the user with an

end-to-end workflow solution.

Pegasus performs a mapping of the entire workflow, portions of the work-

flow, or individual tasks onto the available resources. In the simplest case

Pegasus chooses the sources of input data (assuming that it is replicated

in the environment) and the locations where the tasks are to be executed.

Pegasus provides an interface to a user-defined scheduler and includes a num-

ber of scheduling algorithms. As with many scheduling algorithms, the quality

of the schedule depends on the quality of the information both of the execu-

tion time of the tasks and data access as well as the information about the

resources. In addition to the basic mapping algorithm, Pegasus can perform

the following optimizations: tasks clustering, data reuse, data cleanup, and

partitioning. Before the workflow mapping, the original workflow can be par-

titioned into any number of subworkflows. Each subworkflow is then mapped

by Pegasus. The order of the mapping is dictated by the dependencies be-

tween the subworkflows. In some cases the subworkflows can be mapped and

executed in parallel. The granularity of the partitioning is dictated by how

fast the target execution resources are changing. In a dynamic environment,

partitions with small numbers of tasks are preferable, so that only a small

number of tasks are bound to resources at any one time. On the other hand,

in a dedicated execution environment, the entire workflow can be mapped at

once. Pegasus can also reuse intermediate data products if they are available

and thus possibly reduce the amount of computation that needs to be per-

formed. Pegasus also adds data cleanup nodes to the workflow, which remove

the data at the execution sites when they are no longer needed. This often

results in a reduce workflow data footprint. Finally, Pegasus can also perform

task clustering, treating a set of tasks as one for the purpose of scheduling

to a remote location. The execution of the cluster at the remote site can be

sequential or parallel (if applicable). Task clustering can be beneficial for fine

computational granularity workflows. Pegasus has also been used in conjunc-

tion with resource-provisioning techniques to improve the overall workflow

performance. 53

13.4.3 Workflow Execution

In this section, we contrast approaches to workflow execution in Pegasus,

Triana, and Kepler. Pegasus can map workflows onto a variety of target

* Classified advertisement

Search WWH ::

Custom Search

Home