Database Reference
In-Depth Information
Another differentiator between systems is the granularity of the data that a
provenance management system uses to keep track of the origins of the data.
Again, the coupling of the provenance approach to the execution technology
can influence the capability of the provenance management system from a
granularity viewpoint. For instance, some workflow systems that allow for
files to be manipulated by command-line programs such as Pegasus tend to
track the provenance of files (and not the data they contain). This capability
is sucient in some cases, but is too coarse-grained in others. Systems such as
Kepler, 51 on the other hand, have specific capabilities to track the provenance
of collections. Other systems are capable of tracking the origins of programs,
such as VisTrails. The PASOA system has been demonstrated to capture
provenance for data at multiple levels of granularity (files, file contents, col-
lections, etc.), and its integration with Pegasus showed it could be used to
track the change in the workflow produced by the Pegasus workflow compiler.
Systems such as Earth Systems Science Server (ES3) 59 and Provenance-
Aware Storage System (PASS) 60 capture events at the level of the operating
system, typically reconstructing a provenance representation of files. In such
a context, workflow scripts are seen as files whose origin can also be tracked.
The database community has also investigated the concept of provenance.
Reusing the terminology introduced in this section, their solutions can gener-
ally be regarded as integrated with databases themselves: Given a data prod-
uct stored in a database, they track the origin of data derivations produced by
views and queries. 61 From a granularity viewpoint, provenance attributes can
be applied to tables, rows, and even cells. To accommodate activities taking
place outside databases, provenance models that support copy-and-paste op-
erations across databases have also been proposed. 62 Such provenance models
begin to resemble those for workflows, and research is required to integrate
them smoothly.
Internally, provenance systems capture an explicit representation of the flow
of data within applications, and the associated processes that are executed. At
some level of abstraction all systems in the recent provenance challenge 63 use
some graph structure to express all dependencies between data and processes.
Such graphs are directed acyclic graphs that indicate from which ancestors
processes and data products are derived. Given such a consensus, a specifica-
tion for an open provenance model is emerging 64 and could potentially become
the lingua franca by which provenance systems could exchange information.
We illustrate this model over a concrete example in Section 12.6.
12.5 Metadata in Scientific Applications
In this section we present a couple of examples of how scientific applications
manage their metadata.
Search WWH ::




Custom Search