Tuning for Schema Matching - Schema Matching and Mapping

Databases Reference

In-Depth Information

However, tuning the parameters to fulfil this goal is not an easy task for the user.

Indeed, it has recently been pointed out that the main issue is how to select the most

suitable similarity measures to execute for a given domain and how to adjust the

multiple parameters [ Lee et al. 2007 ]. Due to the numerous possible configurations

of the parameters, it is not possible to try them all. Besides, they require specific

knowledge from the users. Let us imagine that a user has to choose one similarity

measure for matching his/her schemas and to assign a threshold to this measure.

Selecting the appropriate similarity measure first implies that the user is a domain

expert. Further, assigning the threshold means that the user has some background

knowledge about the chosen measure, e.g., its value distribution.

One of the ten challenges for ontology matching focuses on the tuning issue

[ Shvaiko and Euzenat 2008 ]. Authors claim that this tuning is splitted in three cat-

egories: (1) matcher selection, (2) matcher tuning and (3) combination strategy of

the similarity measures. In the first category, we distinguish manual selection from

automatic selection. In the former, evaluation of different matchers and benchmark-

ing tools facilitate the choice of a matcher for a given task [ Yatskevich 2003 ; Do

et al. 2002 ; Duchateau et al. 2007 ; Ferrara et al. 2008 ]. On the contrary, there

exist a few tools that automatically select and build a schema matcher according

to various parameters (YAM [ Duchateau et al. 2009a , b ]). The second category is

mainly dedicated to tools such as eTuner [ Lee et al. 2007 ], which automatically

tunes a schema matcher with its best configuration for a given set of schemas. Any

schema matcher which provides the possibility to change manually the value of

one or more of its parameters also falls in this catagory. The last category gath-

ers the matchers which provide a manual combination of similarity measures (e.g.,

COMA

[ Aumueller et al. 2005 ], BMatch [ Duchateau et al. 2008b ]) and those

which automatically combines these measures (SMB [ Anan and Avigdor 2008 ]and

MatchPlanner from Duchateau [ 2009 ]). Note that in the rest of this chapter, we

consider that the combination strategy is one parameter that can be tuned. In other

words, the third category is merged into the second one.

The rest of the chapter is a survey about most popular parameters in schema

matching and the tuning systems. We have gathered these parameters according to

the entities against which they are applied: input data, similarity measures, combi-

nation of similarity measures and finally the matcher. This means that a tool may be

described at different levels, according to the parameters that they enable to tune.

Thus, the chapter is organized as follows: Section 2 covers the main notions about

tuning. We then present the different parameters that one might face when using

a matcher. These parameters have been sorted in four categories: in Sect. 3 ,we

present the parameters related to input data and user preferences. Then, we describe

in Sect. 4 low-level parameters involved in the schema matching process, namely

those dealing with the similarity measures. One level higher, we find parameters

which aim at combining the similarity measures. They are presented in Sect. 5 .The

highest level is the matcher selection and the involved parameters are discussed in

Sect. 6 . Finally, we conclude and we outline perspectives in Sect. 7 .

CC

Schema Matching and Mapping

Search WWH ::

Custom Search

Home