Towards Large-Scale Schema and Ontology Matching - Schema Matching and Mapping

Databases Reference

In-Depth Information

matching a few small schemas, it is enormously time-consuming and error-prone

for dealing with large schemas encompassing thousands of elements or to match

many schemas. Therefore, automatic or semiautomatic approaches to find semantic

correspondences with minimal manual effort are especially needed for large-scale

matching. Typical use cases of large-scale matching include:

-

Matching large XML schemas, e.g., e-business standards and message formats

( Rahm et al. 2004 ; Smith et al. 2009 )

-

Matching large life science ontologies describing and categorizing biomedical

objects or facts such as genes, the anatomy of different species, diseases, etc.

( Kirsten et al. 2007 ; Zhang et al. 2007 )

-

Matching large web directories or product catalogs ( Avesani et al. 2005 ; Nandi

and Bernstein 2009 )

-

Matching many web forms of deep web data sources to create a mediated search

interface, e.g., for travel reservation or shopping of certain products ( He and

Chang 2006 ; Su et al. 2006 ).

Schema matching (including its ontology matching variant) has been a very active

research area, especially in the last decade, and numerous techniques and prototypes

for automatic matching have been developed ( Rahm and Bernstein 2001 ; Euzenat

and Shvaiko 2007 ). Schema matching has also been used as a first step to solve data

exchange, schema evolution, or data integration problems, e.g., to transform corre-

spondences into an executable mapping for migrating data from a source to a target

schema ( Fagin et al. 2009 ). Most match approaches focus on 2-way or pairwise

schema matching where two related input schemas are matched with each other.

Some algorithms have also been proposed for n-way or holistic schema matching

( He and Chang 2006 ), to determine the semantic overlap in many schemas, e.g.,

to build a mediated schema. The result of pairwise schema matching is usually an

equivalence mapping containing the identified semantic correspondences, i.e., pairs

of semantically equivalent schema elements. Some ontology matching approaches

also try to determine different kinds of correspondences, such as is-a relationships

between ontologies ( Spiliopoulos et al. 2010 ). Due to the typically high semantic

heterogeneity of schemas, algorithms can only determine approximate mappings.

The automatically determined mappings may thus require the inspection and adap-

tation by a human domain expert (deletion of wrong correspondences, addition of

missed correspondences) to obtain the correct mapping.

Despite the advances made, current match systems still struggle to deal with

large-scale match tasks as those mentioned above. In particular, achieving both good

effectiveness and good efficiency are two major challenges for large-scale schema

matching. Effectiveness (high match quality) requires the correct and complete iden-

tification of semantic correspondences, and the larger the search space, the more

difficult it is to achieve. For pairwise schema matching, the search space increases at

least quadratically with the number of elements. Furthermore, the semantic hetero-

geneity is typically high for large-scale match tasks, e.g., the schemas may largely

differ in their size and scope, making it difficult to find all correspondences. Fur-

thermore, elements often have several equivalent elements in the other schema that

Schema Matching and Mapping

Search WWH ::

Custom Search

Home