Database Reference
In-Depth Information
When comparing different workflow execution strategies and approaches, it
seems that a “one size fits all” solution is hard, if not impossible, to achieve. * In
Pegasus, for example, workflow execution is primarily via Condor/DAGMan,
a very mature and reliable platform (with some built-in fault tolerance) for
job-oriented scientific workflows that can be expressed as acyclic task depen-
dency graphs. However, there are applications such as scientific workflows over
remote data streams (see, e.g., Altintas et al. 12 ) which require other models of
computation, for example, to express loops and streaming pipeline parallelism.
Kepler inherits from Ptolemy 16 a number of such advanced models of compu-
tation that can even be combined in different ways. 57 Kepler also adds new
models, for example, a data-oriented model of computation called COMAD
that results in workflows that are easier to build and understand. 22 , 59 Triana,
on the other hand, is a service-oriented system, supporting a wide variety of
different grid and service-oriented execution environments. In the end, user
requirements and application constraints have to be taken into account when
deciding which execution model or system to choose.
13.5 Workflow Provenance and Execution Monitoring
The absence of detailed provenance information presents diculties for scien-
tists wishing to share and reproduce results, validate the results of others, and
reuse the knowledge involved in analyzing and generating scientific data. In
addition, the lack of provenance information may also limit the longevity of
data. For example, without sucient information describing how data prod-
ucts were generated, the value and use of this data may be greatly dimin-
ished. Thus, many current scientific workflow systems provide mechanisms
for recording the provenance of workflows and their associated data prod-
ucts. This provenance information can be used to answer a number of basic
questions posed by scientists related to data, such as: Who created this data
product and when? What were the processes used in creating this data prod-
uct? Which data products were derived from this data product? And were
these two data products derived from the same raw (input) data?
The software infrastructure required to accurately and eciently answer
these questions is far from trivial, 60 , 88 especially in light of the need to
make provenance-related software tools usable by domain scientists, who do
not necessarily have programming expertise. For instance, while it may be
* The different approaches do not necessarily exclude each other, however: for example, Mandal
et al. 58 reports on experiences in combining a Kepler frontend with a Pegasus backend, combining
features of both systems (but also limiting each system's more general capabilities). While end
users typically avoid “system mashups,” some interesting insights into the different approaches
and capabilities can still be gained by such interoperability experiments.
Search WWH ::




Custom Search