Modeling Uncertain Schema Matching - Uncertain Schema Matching

Databases Reference

In-Depth Information

Comparing examples 3.7 and 3.8 raises interesting observations. First, the modeling of second-

line matchers can serve as a reference framework for comparing various research efforts in schema

matching. For example, while combiners and match selectors are defined to be separate types by

Leeetal. [ 2007 ], they were combined and redefined by Gal [ 2006 ]. A second observation involves

the goal of second-line matchers. Second-line matchers aim at improving the outcomes of first-line

schema matchers, increasing their robustness. This idea is appealing since complementary matchers

can potentially compensate for each other's weaknesses [ Bernstein et al. , 2004 ]. Gal [ 2006 ] has

shown that the use of a heuristic, based on top- K best schema matchings, has increased the precision

of mappings by 25% on average, at the cost of a minor 8% reduction in recall.

Table 3.4: Two dimension matcher classification

Matcher

First-Line Matcher

Second-Line Matcher

Non-decisive

Term

Combined

Decision maker

MWBG

We now propose yet another classification of matchers on two orthogonal dimensions (see

Table 3.4 for classification and example matchers). The first dimension separates first- from second-

line schema matchers. The second dimension separates those matchers that aim at specifying schema

matchings, dubbed decision makers , from those that compute similarity values yet do not make

decisions at the schema level. Using Definition 3.5 , we can say that a matcher is decisive if it satisfies

. The most common type is a non-decisive first-line matcher. The OntoBuilder's Term matcher

belongs to this class, as does a WordNet-based decision tree technique proposed by Embley et al.

[ 2002 ]. Combiners, in COMA's terminology, are non-decisive second-line schema matchers. They

combine similarity matrices of other matchers, and hence they are second-line matchers by definition.

However, their similarity matrix is not meant to be used to decide on a single schema matching.

Well-known decisive second-line matchers are algorithms like MWBG and SM . Both algorithms

fall into the category of constraint enforcers as described by Leeetal. [ 2007 ], and both enforce a

cardinality constraint of 1 : 1. Finally, the class of first-line decision makers contains few if any

matchers. The main reason for this is that most systems abide by the long conceptual modeling

tradition of database schema integration, as summarized by Batini et al. [ 1986 ]: “The comparison

activity focuses on primitive objects first...; then it deals with those modeling constructs that represent

associations among primitive objects.” This dichotomy has in the main been preserved in schema

matching as well.

As a concluding remark, we compare the proposed classification with the classifications of

Rahm and Bernstein [ 2001 ] and Euzenat and Shvaiko [ 2007 ]. Rahm and Bernstein [ 2001 ] parti-

tion matchers into individual matchers and combining matchers . The latter class contains only second-

line schema matchers. Individual matchers can also serve as second-line matchers. For example, a

matcher that takes the outcome of another matcher and applies a threshold condition on it is an

individual , second-line matcher. Combining matchers are further partitioned into composite and

hybrid matchers, a classification that is less relevant in our classification system, where the sec-

Uncertain Schema Matching

Search WWH ::

Custom Search

Home