Database Reference
In-Depth Information
determine which methods and components are most suitable for the particu-
lar datasets under investigation. Such exploratory workflow design is common
when developing new analysis methods. Conversely, some applications require
the development of production workflows to be executed on a regular basis
with new datasets or simulation parameters (e.g., environmental monitoring
and analysis workflows or the fusion simulation workflow in Section 13.3).
Another important distinction has to do with what the workflow compo-
nents (called actors or tasks ) represent and model. In science-oriented work-
flows, actors model a scientific method or process. In such workflows indi-
vidual workflow steps generally are meaningful to the scientist, that is, more
or less directly correspond to high-level steps of the scientific method being
automated. Contrasting with science-oriented workflows are resource-oriented
workflows. Actors and workflow steps in the latter model require data and
resource-handling tasks rather than the science. In such cases, the actual ana-
lytical or simulation operations might be “hidden” from the workflow system,
and instead the workflow directly handles the “plumbing” tasks such as data
movement, data replication, and job management (submit, pause, resume,
abort, etc.) The simulation management workflow in Section 13.3 is an exam-
ple of such a resource-oriented “plumbing workflow.”
13.2.3 Models of Computation
Consider a workflow graph W consisting of actors (tasks, workflow steps)
and connections (directed edges) between them. * With W we can associate
a set of parameters p , input datasets x , and output datasets y .A model of
computation (MoC) M prescribes how to execute the parameterized workflow
W p on x to obtain y . Therefore, we can view a MoC as a mapping M :
W ×
P
X
Y , which for any workflow W
P , and
×
W
, parameter settings p
X uniquely determine the workflow outputs y
Y . We denote this
inputs x
by y
. While most current scientific workflow systems employ a
single MoC, the Kepler system, 18 due to its heritage from Ptolemy, 16 supports
more than one such MoC: For each each MoC M , there is a corresponding
director of the same name which implements M .
For example, consider the PN (process network) model of computation.
Using the PN director in Kepler, a workflow W executes as a dataflow process
network. 19 , 20 In PN each actor executes as a separate, data-driven process (or
thread) which is continuously running. Actor connections in PN correspond
to unidirectional channels (modeled as unbounded queues) over which ordered
token streams are sent, and actors in PN block (wait) only when there are not
enough tokens available on the actor's input ports. Process networks naturally
support pipeline parallelism as well as task and data parallelism.
=
M
(
W p
(
x
))
* Here we ignore a number of details: actor ports , subworkflows “hidden” within so-called com-
posite actors, and so forth.
Search WWH ::




Custom Search