Interoperability and Data Integration in the Geosciences - Scientific Data Management

Database Reference

In-Depth Information

10.1 Introduction

The past decade has witnessed a dramatic increase in scientific data being

generated in the physical, earth, and life sciences. This development is pri-

marily a result of major advancements in sensor technology, surveying tech-

niques, computer-based simulations, and instrumentation of experiments. As

stated by Szalay and Gray, 1 it is estimated that the amount of scientific data

generated in these disciplines is now doubling every year. Organizations in

government, industry, as well as academic and private sectors, have made sig-

nificant investments in infrastructures to collect and maintain scientific data

and make them accessible to the public. Good examples of such efforts are the

Sloan Digital Sky Survey in astronomy, 2 the GDB Human Genome Database

and Entrez Genome Database in genomics, 3 , 4 and the Global Biodiversity

Information Facility in ecology, 5 to name only a few.

More and more such domain-specific data management infrastructures are

built to allow users easy access to scientific data, often in a Web-based fashion

through comprehensive Web portals. However, a key challenge is to provide

users with effective means to integrate data from diverse sources to facilitate

data exploration and analysis tasks. Data integration is one of the more tradi-

tional yet still very active fields in the area of databases and data management.

It is concerned with models, techniques, and architectures that provide users

with a uniform logical view of and transparent access to physically distributed

and often heterogeneous data sources. 6 - 9 Data integration is a key theme in

many e-commerce and e-business IT infrastructures, often called enterprise

information integration. 10 In these application domains, the objective is to

integrate business and consumer data from different transactional databases

in order to obtain new information that drives business activities and deci-

sions. Nowadays, several commercial and open-source data integration plat-

forms exist that help businesses to integrate (typically relational) data from

transactional databases, leading to data warehouse and federated database

architectures.

It seems natural to apply similar techniques realized in those business-

oriented data integration platforms to scientific data collections as well. How-

ever, because of the complexity, unprecedented quantities, and diversity of

scientific data, traditional schema-based approaches to data integration are in

general not applicable. In many scientific application domains, there often is

no single conceptual schema that can be developed from the data and schemas

associated with the individual data sources to be integrated. Furthermore, sci-

entific data integration often occurs in an ad hoc fashion. For example, data

relevant to evaluate a scientific hypothesis needs to be discovered and dy-

namically integrated into often complex data analysis and exploration tasks

without requiring to persistently store the data used in these tasks. The prob-

lem many scientists are facing nowadays is how to easily make use of the

ever-increasing number of data repositories in an effective way.

Scientific Data Management

Search WWH ::

Custom Search

Home