Database Reference
In-Depth Information
12.1 Metadata and Provenance
Today data are being collected by a vast number of instruments in every
discipline of science. In addition to raw data, new products are created every
day as a result of processing existing data and running simulations in order to
understand observed data. As the sizes of the datasets grow into the petascale
range, and as data are being shared among and across scientific communities,
the importance of diligently recoding the meaning of data and the way they
were produced increases dramatically.
One can think of metadata as data descriptions that assign meaning to the
data, and data provenance as the information about how data was derived.
Both are critical to the ability to interpret a particular data item. Even when
the same individual is collecting the data and interpreting them, metadata and
provenance are important. However, today, the key drivers for the capture and
management of data descriptions are the scientific collaborations that bring
collective knowledge and resources to solve a particular problem or explore a
research area. Because sharing data in collaborations is essential, these data
need to contain enough information for other members of the collaboration
to interpret them and then use them for their own research. Metadata and
provenance information are also important for the automation of scientific
analysis, where software needs to be able to identify the datasets appropriate
for a particular analysis and then annotate new, derived data with metadata
and provenance information.
Metadata catalogs
Data discovery
he Processing
Data Lifecycle
Provenance catalogs
Software component
libraries
Data processing
Data movement services
Data replica catalogs
Software catalogs
Figure 12.1
The data lifecycle.
Search WWH ::




Custom Search