Database Reference
In-Depth Information
As a result of the scale, users need to decide which data to keep (for example,
in high-energy physics only selected and already preprocessed collision events
are cataloged). When storing provenance, decisions of what to store need to
be made as well. Because it is often hard to predict what will be needed in
the future, sometimes data and related information are irrevocably lost.
Because of the size of the collaborations and datasets, data, metadata, and
provenance information are often not stored at the same location, within the
same system. Thus issues of information federation arise. In some sense, the
issue for provenance is not as severe as for metadata. Provenance is in some
sense inherently distributed, with information about the data coming from
different sources, and it also has explicit links (such as those in the provenance
graph) that allow one to follow the provenance trail. Additionally, once the
process documentation of an item is generated, it will most likely not change
since it is a historical record. On the other hand, metadata about data items
may change with time, or a piece of data may be found invalid. As a result,
metadata requires more effort in the area of consistency management.
The need to share data results in many challenges. First, communities need
to agree on metadata standards. Then, these standards need to be followed by
data publishers and software systems so that a consistent view of metadata
is maintained. When data are shared across communities, mediation between
metadata schemas needs to be performed. The challenge for cross-project or
cross-community interoperability is not only technical but also social. How
does one motivate scientists to provide the necessary metadata about the
primary and derived data? What is the incentive to retrofit the codes and
publish the data into community repositories?
In general, future work should focus on extensible metadata and provenance
systems that follow common standards that are independent of the systems
that use them, and can be shared across distributed collaborations. Such sys-
tems should support common languages for responding to provenance queries.
There is already good progress, but unified metadata and provenance systems
for scientific communities are a long way off.
Acknowledgments
Ewa Deelman's work was funded by the National Science Foundation under
Cooperative Agreement OCI-0438712 and grant # CCF-0725332. Bruce Ber-
riman is supported by the NASA Multi Mission Archive and by the NASA
Exoplanet Science Institute at the Infrared Processing and Analysis Center,
operated by the California Institute of Technology in coordination with the
Jet Propulsion Laboratory (JPL). Oscar Chorcho's work was funded by the
SemsorGrid4Env project (FP7-ICT-223913). The authors would like to thank
members of the Earth System Grid for the use of their metadata example.
Search WWH ::




Custom Search