Database Reference
In-Depth Information
The primary task of a scientific workflow system is to automate the ex-
ecution of scientific workflows. Scientific workflow systems may additionally
support users in the design, composition, and verification of scientific work-
flows. They also may include support for monitoring the execution of work-
flows in real time; recording the processing history of data; planning resource
allocation in distributed execution environments; discovering existing work-
flows and workflow components; recording the lineage of data and evolution of
workflows; and generally managing scientific data. Thus, a scientific workflow
system primarily serves as a workflow execution engine , but may also include
features of problem-solving environments (PSE). 10
Wainer et al. describe some of the differences between business (or “of-
fice automation”) workflows and scientific workflows, stating, “whereas oce
work is about goals, scientific work is about data”. 11 Business workflows are
mainly concerned with the modeling of business rules, policies, and case man-
agement, and therefore are often control- and activity-oriented. In contrast, to
support the work of computational scientists, scientific workflows are mainly
concerned with capturing scientific data analysis or simulation processes and
the associated management of data and computational resources. While scien-
tific workflow technology and research can inherit and adopt techniques from
the field of business workflows, there are several, sometimes subtle, differences
ranging from the modeling paradigms used to the underlying computation
models employed to execute workflows. 86 For example, scientific workflows
are usually dataflow-oriented “analysis pipelines” that often exhibit pipeline
parallelism over data streams in addition to supporting the data parallelism
and task parallelism common in business workflows. * In some cases (for ex-
ample, in seismic or geospatial data processing 12 ), scientific workflows execute
as digital signal processing (DSP) pipelines. In contrast, traditional workflows
often deal with case management (for example, insurance claims, mortgage
applications), tend to be more control-intensive, and lend themselves to very
different models of computation.
In Section 13.2 we introduce basic concepts and describe key characteris-
tics of scientific workflows. In Section 13.3 we provide a detailed case study
from a fusion simulation project where scientific workflows are used to man-
age complex scientific simulations. Section 13.4 describes scientific workflow
systems currently in use and in development. Section 13.5 introduces and dis-
cusses basic notions of data and workflow provenance in the scientific workflow
context, and describes how workflow systems monitor execution and manage
provenance. Finally, Section 13.6 describes approaches for enabling workflow
reuse, sharing, and collaboration.
* In the parallel computing literature, task parallelism refers to distributing tasks (processes)
across different parallel computing nodes, and data parallelism involves distributing data across
multiple nodes. Pipeline parallelism is a more specific condition that arises whenever multiple
processes arranged in a linear sequence execute simultaneously.
Search WWH ::




Custom Search