Databases Reference
In-Depth Information
￿ A variation of the SM matcher is the Dominants matcher. This matcher chooses dominant
pairs , those pairs in the similarity matrix with maximum value in both their row and their
column. The main assumption guiding this heuristic is that the dominant pairs are the most
likely to be attribute correspondences since the two attributes involved in a dominant pair
prefer each other most. It is worth noting that with this heuristic not all the target attributes
are mapped and that an attribute in one schema may be mapped to more than one attribute
in another schema, whenever attribute pairs share the same similarity level.
￿ Finally, Marie and Gal [ 2007b ] have introduced 2LNB , a 2LM that uses a naïve Bayes clas-
sifier over matrices to determine attribute correspondences. Autoplex [ Berlin and Motro ,
2001 ], LSD [ Doan et al. , 2001 ], iMAP [ Dhamankar et al. , 2004 ], and sPLMap
[ Nottelmann and Straccia , 2007 ] also use a naïve Bayes classifier to learn attribute corre-
spondence probabilities using an instance training set. 2LNB is the only 2LM in this group.
We now provide two more examples of second-line matchers, highlighting the differences in
their modus operandi from first-line matchers.
1 matching system was defined by Leeetal. [ 2007 ]to
be a triple, one element being a library of matching components. This library has four types of
components, namely Matcher, Combiner, Constraint Enforcer, and Match Selector. The first type
is a 1LM, in its classical definition. The remaining three types are second-line schema matchers
according to our definition.
A combiner [ Do and Rahm , 2002 ] follows the definition of a schema matcher with a null constraint
function, i.e. , there are no constraints on the set of attribute correspondences in the output. A
combination can be made by aggregating elements of the input matrices or by using machine learning
techniques such as stacking and decision trees.
A constraint enforcer is simply a 2LM (note that our definition in Section 3.1.3 allows adding
constraints at first-line matchers as well).
A match selector returns a matrix in which all elements that are not selected are reduced to 0.Two
examples are given by Leeetal. [ 2007 ]: thresholding and the use of the MWBG algorithm for selecting
a maximum weighted bipartite graph.
eTuner
A model of a 1
:
Example 3.7
Example 3.8 Top-K A heuristic that utilizes the top- K best schema matchings to produce an
improved schema matching was proposed by Gal [ 2006 ] and will be described in depth in Section 5.4 .
It is a special type of a combiner and a match selector, in which the input does not come from different
matchers (as is generally done with ensembles [ Bernstein et al. , 2004 , Embley et al. , 2002 , Gal et al. ,
2005b , Mork et al. , 2006 ]). Rather, the same schema matcher generates multiple matrices that are
then evaluated to generate a single similarity matrix by a special form of thresholding.
Search WWH ::




Custom Search