Tuning for Schema Matching - Schema Matching and Mapping

Databases Reference

In-Depth Information

As described in Bellahsene et al. [ 2011 ], it is possible to measure the quality of

this set of correspondences, e.g., precision, recall and F-measure. We note Fmes the

F-measure applied to a set of correspondences m k .

Fmes (m k ):

D

[0, 1]

Thus, an optimal tuning k in this context consists of finding a configuration

function applied to parameters and domains so that the output of the schema

matcher m k is optimal given the input schemas. In other words, the configuration

of the knobs would be perfectly tuned to achieve the best matching quality. That is,

changing the value of any knob would decrease the matching quality.

Given that (s, p, d)

D

kand Match (s, k)

D m k ,

D m z with Fmes (m z ) > Fmes (m k ).

Likely, the measure of satisfaction over the ouput deals with quality (F-measure).

But it is also possible to tune a schema matcher to optimize time performance, for

instance with decision trees [ Duchateau et al. 2008a ].

In the next sections, we discuss the different parameters that one may face with

when using a schema matcher, based on these definitions.

À

z

D (s,p,d)and Match (s, z)

3

Input and Data Parameters

In this section, we gather the data and input parameters that one may have to con-

figure when using a schema matcher. We do not consider that the input schemas

belong to the tuning parameters. Indeed, a set of input schemas is compulsory to

run the matcher. Thus, this section is dedicated to parameters that may side along

with the input schemas (e.g., expert correspondences, data instances) or parameters

related to techniques used by the matcher (e.g., machine learning, external resources

used by a similarity measure). Indeed, most of these parameters directly affect the

quality or the time performance. Deciding whether to provide any of them, as well

as the choice of the parameters' values, is inherent to the tuning phase. The section is

organized according to the type of parameters. First, data parameters include expert

feedback. Such a reliable knowledge aims at improving the matching quality by

reusing entities that have been checked by a domain expert. This feedback, as well

as data instances, is often combined with machine learning techniques to exploit

them. These machine learning techniques hold various parameters to be efficient

and/or flexible, and we study them in the second part. The third category gathers

external resources, which mainly consist of providing an ontology or dictionary.

Finally, due to the complexity of the matching process and the design of numerous

matchers, there exist very specific parameters that one may only face by using a

given tool.

Schema Matching and Mapping

Search WWH ::

Custom Search

Home