Database Reference
In-Depth Information
modifications to a workflow, the VisTrails system records each change. In-
stead of storing a set of related workflows, the change-based model stores the
operations, or actions, that are applied to the workflows (e.g., the addition or
deletion of a module, the addition or deletion of a connection between mod-
ules, and the modification of a parameter value). This representation (similar,
e.g., to source-code control systems such as Subversion) uses substantially less
space than the alternative of explicitly storing each version of a workflow. In
addition, VisTrails provides an intuitive interface that can help users to both
understand and interact with the version history of a workflow design. 68 This
tree-based view (see Figure 13.4) allows a user to return to a previous version,
undo changes, compare different workflows, and determine the actions that
led to a particular result.
In addition, query languages and user interfaces that can allow users to
explore the provenance of workflow runs are also important. 59 , 64 , 68 , 69 , 87 For
example, the ability to query both the specification and provenance of com-
putational tasks enables users to better understand the tasks and their re-
sults. In this way, users can identify workflows that are suitable for and can
be reused for a given task; identify workflow instances that have been found
to contain anomalies; and compare and understand the differences between
workflows. 59 , 68 , 69 Many existing workflow systems support query and visual-
ization of provenance information associated with the workflow definition and
execution layers (e.g., see Moreau et al. 70 ).
13.5.1 Example Implementation of a Provenance Framework
Figure 13.5 shows the high-level architecture for the provenance framework
employed within the SDM Center. The architecture has been implemented
with the goal of supporting scientists as they run large-scale simulations. 71 , 72
At the heart of this framework is the provenance store , which includes one
or more databases providing physical storage, as well as various application
programming interfaces (APIs) to access and manage provenance information.
The provenance store within the SDM framework captures the following
types of information:
Process monitoring information, which includes data transfer rates,
file sizes moved, time taken for actor execution and check-pointing,
memory usage, process states (initiated, executing, waiting, terminated,
aborted), and so forth, This information is useful, for example, to bench-
mark workflow execution and detect bottlenecks.
Data provenance and lineage information, which links an actor's data
output to (1) the specific actor invocation that created the data, (2)
the relevant data inputs, and (3) the parameters at the time of invoca-
tion. Data provenance allows a scientist to interpret and “debug” anal-
ysis results, for example, by stepping through time in the processing
history, thus tracing back (intermediate) results to the inputs that
Search WWH ::




Custom Search