Databases Reference
In-Depth Information
Fig. 10.4
COMA CC user interface for selecting combination strategy
In SMB [ Anan and Avigdor 2008 ], the output of a weak similarity measure
(called first-line matcher ) is combined with a decision maker (or second-line
matcher ) to discover correspondences. The combination strategy depends on the
decision maker, which can be Maximum Weighted Bipartite Graph algorithm, Stable
Marriage ,etc.
In YAM [ Duchateau et al. 2009a , b ], the combination of similarity measures is
performed by a machine learning classifier. Authors consider that any classifier is
a matcher since it classifies pairs of schema elements as relevant or not. Thus, the
combination of the similarity measures depends on the type of classifier (decision
tree, Bayes network, neural network, etc.).
To sum up, many tools have designed their own strategies to combine similarity
measures. However, most of them are based on weighted functions that the users
may have to tune.
5.2
Weights in Formulas
Previously, we have detailed different types of strategies. One of the most com-
mon strategy in the matching community is the linear regression to aggregate values
computed by similarity measures. In that case, the weights given to each measure is
important according to the domain and the schemas to be matched. For instance, if a
domain ontology is available, one may decide to give a high weight to the measures,
which are able to use this ontology. However, tuning these weights manually still
requires user expertise.
Search WWH ::




Custom Search