Tuning for Schema Matching - Schema Matching and Mapping

Databases Reference

In-Depth Information

Fig. 10.4

COMA CC user interface for selecting combination strategy

In SMB [ Anan and Avigdor 2008 ], the output of a weak similarity measure

(called first-line matcher ) is combined with a decision maker (or second-line

matcher ) to discover correspondences. The combination strategy depends on the

decision maker, which can be Maximum Weighted Bipartite Graph algorithm, Stable

Marriage ,etc.

In YAM [ Duchateau et al. 2009a , b ], the combination of similarity measures is

performed by a machine learning classifier. Authors consider that any classifier is

a matcher since it classifies pairs of schema elements as relevant or not. Thus, the

combination of the similarity measures depends on the type of classifier (decision

tree, Bayes network, neural network, etc.).

To sum up, many tools have designed their own strategies to combine similarity

measures. However, most of them are based on weighted functions that the users

may have to tune.

5.2

Weights in Formulas

Previously, we have detailed different types of strategies. One of the most com-

mon strategy in the matching community is the linear regression to aggregate values

computed by similarity measures. In that case, the weights given to each measure is

important according to the domain and the schemas to be matched. For instance, if a

domain ontology is available, one may decide to give a high weight to the measures,

which are able to use this ontology. However, tuning these weights manually still

requires user expertise.

Search WWH ::

Custom Search

Home