Database Reference
In-Depth Information
approaches, technologies, and implementations. In the recent provenance
challenge, 50 16 different systems were used to answer typical provenance
queries pertaining to a brain atlas dataset that was produced by a demon-
strator workflow in the context of functional magnetic resonance imaging.
Inspired by the summary of contributions in Moreau and Ludaescher, 50 we
present key characteristics of provenance systems. Most provenance systems
are embedded inside an execution environment, such as a workflow system or
an operating system. In such a context, embedded provenance systems can
track all the activities of this execution environment and are capable of pro-
viding a description of data produced by such environments. We characterize
such systems as integrated environments , since they offer multiple functional-
ities, including workflow editing, workflow execution, provenance collection,
and provenance querying. 21 , 51 , 52 Integrated environments have some benefits,
including usability and seamless integration between the different activities.
From a provenance viewpoint, there is close semantic integration between the
provenance representation and the workflow model, which allows ecient rep-
resentation to be adopted. 53 The downside of integrated systems is that the
tight coupling of components rarely allows for their substitution or use in
combination with other useful technologies; such systems therefore have di-
culties interoperating with others, a requirement of many large-scale scientific
applications.
In contrast to integrated provenance environments, approaches such as
Provenance-Aware Service-Oriented Architecture (PASOA) 54 , 55 and Karma 56
adopt separate, autonomous provenance stores. As execution proceeds, appli-
cations produce process documentation that is recorded in a storage system,
usually referred to as a provenance store . Such systems give the provenance
store an important role, since it offers long-term, persistent, secure storage of
process documentation. Provenance of data products can be extracted from
provenance stores by issuing queries to them. Over time, provenance stores
need to be managed to ensure that process documentation remains accessible
and usable in the long term. In particular, PASOA has adopted a provenance
model that is independent of the technology used for executing the appli-
cation. PASOA was demonstrated to operate with multiple workflow tech-
nologies, including Pegasus, 19 Virtual Data Language (VDL) 57 and Business
Process Execution Language (BPEL). 58 This approach that favors open data
models and open interfaces allows the scientist to adopt the technologies of
their choice to run applications. However, a common provenance model would
allow for past executions to be described in a coherent manner, even when
multiple technologies are involved.
All provenance systems rely on some form of database management system
to store their data, and RDF and SQL stores were the preferred technologies.
Associated query languages are used to express provenance queries, but some
systems use query templates and query interfaces that are specifically prove-
nance oriented, helping users to express precisely and easily their provenance
questions without having to understand the underpinning schemas adopted
by the implementations.
Search WWH ::




Custom Search