Databases Reference
In-Depth Information
matching a few small schemas, it is enormously time-consuming and error-prone
for dealing with large schemas encompassing thousands of elements or to match
many schemas. Therefore, automatic or semiautomatic approaches to find semantic
correspondences with minimal manual effort are especially needed for large-scale
matching. Typical use cases of large-scale matching include:
-
Matching large XML schemas, e.g., e-business standards and message formats
( Rahm et al. 2004 ; Smith et al. 2009 )
-
Matching large life science ontologies describing and categorizing biomedical
objects or facts such as genes, the anatomy of different species, diseases, etc.
( Kirsten et al. 2007 ; Zhang et al. 2007 )
-
Matching large web directories or product catalogs ( Avesani et al. 2005 ; Nandi
and Bernstein 2009 )
-
Matching many web forms of deep web data sources to create a mediated search
interface, e.g., for travel reservation or shopping of certain products ( He and
Chang 2006 ; Su et al. 2006 ).
Schema matching (including its ontology matching variant) has been a very active
research area, especially in the last decade, and numerous techniques and prototypes
for automatic matching have been developed ( Rahm and Bernstein 2001 ; Euzenat
and Shvaiko 2007 ). Schema matching has also been used as a first step to solve data
exchange, schema evolution, or data integration problems, e.g., to transform corre-
spondences into an executable mapping for migrating data from a source to a target
schema ( Fagin et al. 2009 ). Most match approaches focus on 2-way or pairwise
schema matching where two related input schemas are matched with each other.
Some algorithms have also been proposed for n-way or holistic schema matching
( He and Chang 2006 ), to determine the semantic overlap in many schemas, e.g.,
to build a mediated schema. The result of pairwise schema matching is usually an
equivalence mapping containing the identified semantic correspondences, i.e., pairs
of semantically equivalent schema elements. Some ontology matching approaches
also try to determine different kinds of correspondences, such as is-a relationships
between ontologies ( Spiliopoulos et al. 2010 ). Due to the typically high semantic
heterogeneity of schemas, algorithms can only determine approximate mappings.
The automatically determined mappings may thus require the inspection and adap-
tation by a human domain expert (deletion of wrong correspondences, addition of
missed correspondences) to obtain the correct mapping.
Despite the advances made, current match systems still struggle to deal with
large-scale match tasks as those mentioned above. In particular, achieving both good
effectiveness and good efficiency are two major challenges for large-scale schema
matching. Effectiveness (high match quality) requires the correct and complete iden-
tification of semantic correspondences, and the larger the search space, the more
difficult it is to achieve. For pairwise schema matching, the search space increases at
least quadratically with the number of elements. Furthermore, the semantic hetero-
geneity is typically high for large-scale match tasks, e.g., the schemas may largely
differ in their size and scope, making it difficult to find all correspondences. Fur-
thermore, elements often have several equivalent elements in the other schema that
Search WWH ::




Custom Search