Database Reference
In-Depth Information
10.1 Introduction
The past decade has witnessed a dramatic increase in scientific data being
generated in the physical, earth, and life sciences. This development is pri-
marily a result of major advancements in sensor technology, surveying tech-
niques, computer-based simulations, and instrumentation of experiments. As
stated by Szalay and Gray, 1 it is estimated that the amount of scientific data
generated in these disciplines is now doubling every year. Organizations in
government, industry, as well as academic and private sectors, have made sig-
nificant investments in infrastructures to collect and maintain scientific data
and make them accessible to the public. Good examples of such efforts are the
Sloan Digital Sky Survey in astronomy, 2 the GDB Human Genome Database
and Entrez Genome Database in genomics, 3 , 4 and the Global Biodiversity
Information Facility in ecology, 5 to name only a few.
More and more such domain-specific data management infrastructures are
built to allow users easy access to scientific data, often in a Web-based fashion
through comprehensive Web portals. However, a key challenge is to provide
users with effective means to integrate data from diverse sources to facilitate
data exploration and analysis tasks. Data integration is one of the more tradi-
tional yet still very active fields in the area of databases and data management.
It is concerned with models, techniques, and architectures that provide users
with a uniform logical view of and transparent access to physically distributed
and often heterogeneous data sources. 6 - 9 Data integration is a key theme in
many e-commerce and e-business IT infrastructures, often called enterprise
information integration. 10 In these application domains, the objective is to
integrate business and consumer data from different transactional databases
in order to obtain new information that drives business activities and deci-
sions. Nowadays, several commercial and open-source data integration plat-
forms exist that help businesses to integrate (typically relational) data from
transactional databases, leading to data warehouse and federated database
architectures.
It seems natural to apply similar techniques realized in those business-
oriented data integration platforms to scientific data collections as well. How-
ever, because of the complexity, unprecedented quantities, and diversity of
scientific data, traditional schema-based approaches to data integration are in
general not applicable. In many scientific application domains, there often is
no single conceptual schema that can be developed from the data and schemas
associated with the individual data sources to be integrated. Furthermore, sci-
entific data integration often occurs in an ad hoc fashion. For example, data
relevant to evaluate a scientific hypothesis needs to be discovered and dy-
namically integrated into often complex data analysis and exploration tasks
without requiring to persistently store the data used in these tasks. The prob-
lem many scientists are facing nowadays is how to easily make use of the
ever-increasing number of data repositories in an effective way.
Search WWH ::




Custom Search