Modeling Uncertain Schema Matching - Uncertain Schema Matching

Databases Reference

In-Depth Information

A variation of the SM matcher is the Dominants matcher. This matcher chooses dominant

pairs , those pairs in the similarity matrix with maximum value in both their row and their

column. The main assumption guiding this heuristic is that the dominant pairs are the most

likely to be attribute correspondences since the two attributes involved in a dominant pair

prefer each other most. It is worth noting that with this heuristic not all the target attributes

are mapped and that an attribute in one schema may be mapped to more than one attribute

in another schema, whenever attribute pairs share the same similarity level.

Finally, Marie and Gal [ 2007b ] have introduced 2LNB , a 2LM that uses a naïve Bayes clas-

sifier over matrices to determine attribute correspondences. Autoplex [ Berlin and Motro ,

2001 ], LSD [ Doan et al. , 2001 ], iMAP [ Dhamankar et al. , 2004 ], and sPLMap

[ Nottelmann and Straccia , 2007 ] also use a naïve Bayes classifier to learn attribute corre-

spondence probabilities using an instance training set. 2LNB is the only 2LM in this group.

We now provide two more examples of second-line matchers, highlighting the differences in

their modus operandi from first-line matchers.

1 matching system was defined by Leeetal. [ 2007 ]to

be a triple, one element being a library of matching components. This library has four types of

components, namely Matcher, Combiner, Constraint Enforcer, and Match Selector. The first type

is a 1LM, in its classical definition. The remaining three types are second-line schema matchers

according to our definition.

A combiner [ Do and Rahm , 2002 ] follows the definition of a schema matcher with a null constraint

function, i.e. , there are no constraints on the set of attribute correspondences in the output. A

combination can be made by aggregating elements of the input matrices or by using machine learning

techniques such as stacking and decision trees.

A constraint enforcer is simply a 2LM (note that our definition in Section 3.1.3 allows adding

constraints at first-line matchers as well).

A match selector returns a matrix in which all elements that are not selected are reduced to 0.Two

examples are given by Leeetal. [ 2007 ]: thresholding and the use of the MWBG algorithm for selecting

a maximum weighted bipartite graph.

eTuner

A model of a 1

:

Example 3.7

Example 3.8 Top-K A heuristic that utilizes the top- K best schema matchings to produce an

improved schema matching was proposed by Gal [ 2006 ] and will be described in depth in Section 5.4 .

It is a special type of a combiner and a match selector, in which the input does not come from different

matchers (as is generally done with ensembles [ Bernstein et al. , 2004 , Embley et al. , 2002 , Gal et al. ,

2005b , Mork et al. , 2006 ]). Rather, the same schema matcher generates multiple matrices that are

then evaluated to generate a single similarity matrix by a special form of thresholding.

Search WWH ::

Custom Search

Home