Database Reference
In-Depth Information
In
SDF
(synchronous data-flow), each actor has fixed token consumption
and production rates. In Kepler this allows the
SDF
director to construct an
actor firing schedule
prior to executing the workflow.
21
This also allows the
SDF
director to readily execute workflows in a single thread, firing actors one
at a time based on the schedule.
Workflows employing the
PN
and
SDF
directors in Kepler may include cy-
cles in the workflow graph. We use the term
DAG
to refer to a model of
computation that restricts the workflow graph
W
to a directed,
acyclic
graph
of task dependencies. In
DAG
each actor node in
W
is executed only once,
and each actor
A
in
W
is executed only after all actors
A
preceding
A
(de-
noted
A
≺
W
A
)in
W
have
finished
their execution. Note that we make no
assumption about whether
W
is executed sequentially or task parallel; we
only require that any
DAG
-compatible schedule for
W
satisfy the partial or-
der
≺
W
induced by
W
.A
DAG
director can obtain all legal schedules for
W
(i.e., the relation
≺
W
) via a topological sort of
W
. Finally, note that the
DAG
model can easily support task and data parallelism, but not pipeline
parallelism.
Another model of computation, extending
PN
,is
COMAD
(Collection-
Oriented Modeling and Design).
59
,
89
In this MoC, actors operate on streams
of nested data collections (similar to XML data), and can be configured (via
XPath-like
scope expressions
and
signatures
) to “pick up” and operate only
on relevant parts of the input stream, injecting results back into the output
stream for further downstream processing. This MoC can simplify workflow
design and reuse when compared with
DAG
,
SDF
, and
PN
workflows.
22
13.2.4 Benefits of Scientific Workflows
Scientific workflows are designed to help scientists perform effective compu-
tational experiments by providing an environment that simplifies (
in silico
)
experimental design, implementation, and documentation. The increasing use
of scientific workflow environments and systems is due to a number of advan-
tages these systems can offer over alternative approaches.
Scientific workflows
automate
repetitive tasks, allowing scientists to fo-
cus on the science driving the experiment instead of data and process
management. For example, automation of parameter studies—where the
same process is performed hundreds to thousands of times with different
parameter sets—can often be more easily and eciently achieved than
with conventional programming approaches.
Scientific workflows explicitly
document
the scientific process being per-
formed, which can lead to better communication, collaboration (e.g.,
sharing of workflows among scientists), and reproducibility of results.
Scientific workflow systems can be used to
monitor
workflow execution
and
record
the provenance of workflow results. Provenance, in particular,