Database Reference
In-Depth Information
abstract tasks can be matched dynamically to Condor resources. The matching
is done by the Condor matchmaker, which matches the requirements of an
abstract task specified in a Condor classAd * with the resource preferences
published in their classAds. We also note that Pegasus uses DAGMan as an
execution engine (Figure 13.3). Currently, Pegasus and DAGMan are being
integrated into a system, Pegasus-WMS, which provides the user with an
end-to-end workflow solution.
Pegasus performs a mapping of the entire workflow, portions of the work-
flow, or individual tasks onto the available resources. In the simplest case
Pegasus chooses the sources of input data (assuming that it is replicated
in the environment) and the locations where the tasks are to be executed.
Pegasus provides an interface to a user-defined scheduler and includes a num-
ber of scheduling algorithms. As with many scheduling algorithms, the quality
of the schedule depends on the quality of the information both of the execu-
tion time of the tasks and data access as well as the information about the
resources. In addition to the basic mapping algorithm, Pegasus can perform
the following optimizations: tasks clustering, data reuse, data cleanup, and
partitioning. Before the workflow mapping, the original workflow can be par-
titioned into any number of subworkflows. Each subworkflow is then mapped
by Pegasus. The order of the mapping is dictated by the dependencies be-
tween the subworkflows. In some cases the subworkflows can be mapped and
executed in parallel. The granularity of the partitioning is dictated by how
fast the target execution resources are changing. In a dynamic environment,
partitions with small numbers of tasks are preferable, so that only a small
number of tasks are bound to resources at any one time. On the other hand,
in a dedicated execution environment, the entire workflow can be mapped at
once. Pegasus can also reuse intermediate data products if they are available
and thus possibly reduce the amount of computation that needs to be per-
formed. Pegasus also adds data cleanup nodes to the workflow, which remove
the data at the execution sites when they are no longer needed. This often
results in a reduce workflow data footprint. Finally, Pegasus can also perform
task clustering, treating a set of tasks as one for the purpose of scheduling
to a remote location. The execution of the cluster at the remote site can be
sequential or parallel (if applicable). Task clustering can be beneficial for fine
computational granularity workflows. Pegasus has also been used in conjunc-
tion with resource-provisioning techniques to improve the overall workflow
performance. 53
13.4.3 Workflow Execution
In this section, we contrast approaches to workflow execution in Pegasus,
Triana, and Kepler. Pegasus can map workflows onto a variety of target
* Classified advertisement
Search WWH ::




Custom Search