Database Reference
In-Depth Information
Nodes can be connected by edges expressing causal dependencies between
artifacts and processes. The origin of an edge represents an effect, whereas
its destination represents a cause: The presence of an edge makes explicit the
causal dependency between the effect and its cause. In this presentation, we
focus on two types of edges: “wasGeneratedBy” and “used.” A “wasGener-
atedBy” edge expresses how an artifact was dependent on a process for its
generation, whereas a ”used” edge indicates that a process relied on some
artifacts to be able to complete. An artifact can only be generated by a single
process, but it can be used by any number of processes; whereas a process can
use and generate any number of artifacts. To be able to distinguish the multi-
ple dependent artifacts a process may rely upon, a notion of role is introduced,
allowing the nature of the causal dependency to be characterized explicitly.
Using the above notation, we show a provenance graph generated from
the workflow adopted by the provenance challenge, 63 which is inspired by
functional MRI (fMRI) workflows to create population-based “brain atlases”
from the fMRI Data Center's archive of high-resolution anatomical data. 91 In
summary, this workflow produces average images along the axes X, Y, and Z,
after aligning each input sample with a reference image. Note that like the
other applications discussed, neuroscience applications require provenance. 92
Figure 12.5 illustrates a subset of the provenance graph that is constructed
as the provenance challenge workflow. Such a graph is best read from right
to left: The right identifies an artifact, the Atlas X graphic, representing an
averaged image along the X axis; all the causal dependencies that led it to be
produced appear to its left. Provenance graphs are directed and acyclic, which
means that an artifact or a process cannot be (transitively) caused by itself.
Whenever a scientific workflow system executes the fMRI workflow, it would
incrementally produce the various elements of that graph (or an equivalent
representation), and store them in a repository, usually referred to as prove-
nance store or provenance catalog. Provenance queries can then be issued to
extract a subset of the documentation produced, according to the user's needs.
In conclusion, provenance is critical in many scientific applications ranging
from neuroscience to astronomy. As scientific applications become increas-
ingly open and integrated across areas, provenance interoperability becomes
an important requirement for systems technologies.
12.7 Current and Future Challenges
There are several challenges in the area of metadata and provenance manage-
ment. They stem mostly from two facts: (1) scientists need to share informa-
tion about data within their collaborations and with outside colleagues, and
(2) the amount of data and related information is growing at unprecedented
scales.
Search WWH ::




Custom Search