Databases Reference
In-Depth Information
However, tuning the parameters to fulfil this goal is not an easy task for the user.
Indeed, it has recently been pointed out that the main issue is how to select the most
suitable similarity measures to execute for a given domain and how to adjust the
multiple parameters [ Lee et al. 2007 ]. Due to the numerous possible configurations
of the parameters, it is not possible to try them all. Besides, they require specific
knowledge from the users. Let us imagine that a user has to choose one similarity
measure for matching his/her schemas and to assign a threshold to this measure.
Selecting the appropriate similarity measure first implies that the user is a domain
expert. Further, assigning the threshold means that the user has some background
knowledge about the chosen measure, e.g., its value distribution.
One of the ten challenges for ontology matching focuses on the tuning issue
[ Shvaiko and Euzenat 2008 ]. Authors claim that this tuning is splitted in three cat-
egories: (1) matcher selection, (2) matcher tuning and (3) combination strategy of
the similarity measures. In the first category, we distinguish manual selection from
automatic selection. In the former, evaluation of different matchers and benchmark-
ing tools facilitate the choice of a matcher for a given task [ Yatskevich 2003 ; Do
et al. 2002 ; Duchateau et al. 2007 ; Ferrara et al. 2008 ]. On the contrary, there
exist a few tools that automatically select and build a schema matcher according
to various parameters (YAM [ Duchateau et al. 2009a , b ]). The second category is
mainly dedicated to tools such as eTuner [ Lee et al. 2007 ], which automatically
tunes a schema matcher with its best configuration for a given set of schemas. Any
schema matcher which provides the possibility to change manually the value of
one or more of its parameters also falls in this catagory. The last category gath-
ers the matchers which provide a manual combination of similarity measures (e.g.,
COMA
[ Aumueller et al. 2005 ], BMatch [ Duchateau et al. 2008b ]) and those
which automatically combines these measures (SMB [ Anan and Avigdor 2008 ]and
MatchPlanner from Duchateau [ 2009 ]). Note that in the rest of this chapter, we
consider that the combination strategy is one parameter that can be tuned. In other
words, the third category is merged into the second one.
The rest of the chapter is a survey about most popular parameters in schema
matching and the tuning systems. We have gathered these parameters according to
the entities against which they are applied: input data, similarity measures, combi-
nation of similarity measures and finally the matcher. This means that a tool may be
described at different levels, according to the parameters that they enable to tune.
Thus, the chapter is organized as follows: Section 2 covers the main notions about
tuning. We then present the different parameters that one might face when using
a matcher. These parameters have been sorted in four categories: in Sect. 3 ,we
present the parameters related to input data and user preferences. Then, we describe
in Sect. 4 low-level parameters involved in the schema matching process, namely
those dealing with the similarity measures. One level higher, we find parameters
which aim at combining the similarity measures. They are presented in Sect. 5 .The
highest level is the matcher selection and the involved parameters are discussed in
Sect. 6 . Finally, we conclude and we outline perspectives in Sect. 7 .
CC
Search WWH ::




Custom Search