Introduction - Uncertain Schema Matching

Databases Reference

In-Depth Information

CHAPTER

1

Introduction

Doubt is not a pleasant condition, but certainty is absurd.

- Voltaire

Schema matching is the task of providing correspondences between concepts describing

the meaning of data in various heterogeneous, distributed data sources ( e.g. attributes in database

schemata, tags in XML DTDs, fields in HTML forms, input and output parameters in Web services,

etc. ). Schema matching is one of the basic operations required by the process of data and schema

integration [ Batini et al. , 1986 , Bernstein and Melnik , 2004 , Lenzerini , 2002 ], and thus has great

effect on its outcomes, whether these involve targeted content delivery, view integration, database

integration, query rewriting over heterogeneous sources, duplicate data elimination, or automatic

streamlining of workflow activities that involve heterogeneous data sources. As such, schema match-

ing affects numerous modern applications from a wide variety of areas. It impacts business, where

company data sources continuously realign due to changing markets; and it affects the way busi-

ness and other information consumers seek information over the Web. It impacts the life sciences,

where scientific workflows cross system boundaries more often than not. Finally, it impacts the way

communities of knowledge are created and evolve.

Schema matching research has been going on for more than 25 years now, first as part

of schema integration and then as a standalone research field (see surveys [ Batini et al. , 1986 ,

Rahm and Bernstein , 2001 , Sheth and Larson , 1990 , Shvaiko and Euzenat , 2005 ] and online lists,

e.g. , OntologyMatching 1 and Ziegler 2 ). Over the years, a significant body of work has been de-

voted to the identification of schema matchers , heuristics for schema matching. The main objective of

schema matchers is to provide correspondences that will be effective from the user's point of view, yet

computationally efficient (or at least not disastrously expensive). Examples of algorithmic tools used

for schema matching include COMA [ Do and Rahm , 2002 ], Cupid [ Madhavan et al. , 2001 ], Onto-

Builder [ Gal et al. , 2005b ], Autoplex [ Berlin and Motro , 2001 ], Similarity Flooding [ Melnik et al. ,

2003 ], Clio [ Miller et al. , 2001 ], Glue [ Doan et al. , 2002 ], and others [ Bergamaschi et al. , 2001 ,

Castano et al. , 2001 , Saleem et al. , 2007 ]. These have come out of a variety of different research

communities, including database management, information retrieval, the information sciences, data

semantics and the semantic Web, and others. Research papers from different communities have

1 http://www.ontologymatching.org/

2 http://www.ifi.unizh.ch/˜pziegler/IntegrationProjects.html

Search WWH ::

Custom Search

Home