Database Reference
In-Depth Information
is not required by end users, since their needs tend to be limited to specific
tasks, such as experiment reproducibility or validation of an analysis.
To support the vision of provenance of electronic data, we make the distinc-
tion between process documentation , a representation of past processes as they
occur inside computer systems, and provenance queries , extracting relevant
information from process documentation to support users' needs.
Process documentation is collected during execution of processes or work-
flows and begins to be accumulated well before data are produced, or even
before it is known that some dataset is to be produced. Hence, management
of such process documentation is different from metadata management. In
practice, in a given application context, users may identify commonly asked
provenance queries, which can be precomputed, and for which the results are
stored and made available.
Similar to the earlier discussion of different metadata layers, we can think
of provenance as consisting of descriptions at different levels of abstraction,
essentially aimed at different audiences: to support scientific reproducibil-
ity, engineering reproducibility, or even deeper understanding of the process
that created the derived data (we provide an example of the latter in the
context of scientific workflows below). In terms of scientific reproducibility,
where scientists want to share and verify their findings with colleagues in-
side or outside their collaboration, the user may need to know what datasets
were used and what type of analysis with what parameters were used. How-
ever, in cases where the results need to be reproduced bit by bit, more
detailed information about the hardware architecture of the resource, envi-
ronment variables used, library versions, and the like are needed. Finally,
provenance can also be used to analyze the performance of the analyses, 17
where the provenance records are mined to determine the number of tasks
executed, their runtime distribution, where the execution took place, and
so forth.
In some cases, scientific processes are managed by workflow management
systems. These may take in an abstract workflow description and generate an
executable workflow. During the mapping the workflow system may modify
the executable workflow to the point that it is no longer easy to map between
what has been executed and what the user specified. 18 As a result, informa-
tion about the workflow restructuring process needs to be recorded as well. 19
This information not only allows us to relate the user-created and the exe-
cutable workflow but is also the foundation for workflow debugging, where the
user can trace how the specification they provided evolved into an executable
workflow.
In the area of workflow management and provenance, an interesting aspect
of workflow creation is the ability to retrace how a particular workflow has
been designed, or in other words, to determine the provenance of the workflow
creation process. A particularly interesting approach is taken in VisTrails 20 , 21
where the user is presented with a graphical interface for workflow creation
and the system incrementally saves the state of the workflow as it is being
Search WWH ::




Custom Search