Information Technology Reference
In-Depth Information
performed manually, and leverage this platform to make their explora-
tion trackable and reproducible.
Scientific workflow , as a marriage of workflow technologies and
domain science, is a specialized form of a workflow designed specifi-
cally to compose and execute a series of computational or data
manipulation applications in science. In this chapter and the next
one, we use biological science as an exemplary domain and focus
on Web service-based workflow systems.
Automate . To achieve a fully functional data pipeline, scientists
used to switch among browsers, copy from one Web page,
convert the obtained data, and paste it to another one. Scientific
workflow systems allow them to build a flow model of their data
processing pipeline, typically using a graph or script. This flow
model will integrate multiple steps including invocations to
Web services and other facilitating steps. Each service receives
data from previous steps, and facilitating steps provide utilities
such as reading data from external sources and data format
transformation. Once a flow model is built, its execution is
handled by a workflow engine. As a result, scientists no longer
need to switch from browsers and software tools, record the
intermediary and final results, and do the data transformation
manually.
Audit . During the execution of a scientific workflow, the work-
flow system also keeps track of the data involved, through all
transformations, analyses, and interpretations. This audit infor-
mation, also referred as data lineage or provenance [174], is of
paramount importance to ensure the validity and reproducibility
of the experiments undertaken. This audit feature provided by a
scientific workflow system makes it a favorable constituent
in the increasingly popular electronic laboratory notebook
systems [175].
Reuse . Scientific workflows enable reuse in two different scenar-
ios. In the first one, a workflow template can be instantiated for
many times, each with a different input, or a parameter setting, or
both. This scenario, usually called input-and-parameter-sweep, is
quite common in biological experiments where a routine needs
to be tried using different specimens and under various settings.
Search WWH ::




Custom Search