Databases Reference
In-Depth Information
As described in Bellahsene et al. [ 2011 ], it is possible to measure the quality of
this set of correspondences, e.g., precision, recall and F-measure. We note Fmes the
F-measure applied to a set of correspondences m k .
Fmes (m k ):
D
[0, 1]
Thus, an optimal tuning k in this context consists of finding a configuration
function applied to parameters and domains so that the output of the schema
matcher m k is optimal given the input schemas. In other words, the configuration
of the knobs would be perfectly tuned to achieve the best matching quality. That is,
changing the value of any knob would decrease the matching quality.
Given that (s, p, d)
D
kand Match (s, k)
D m k ,
D m z with Fmes (m z ) > Fmes (m k ).
Likely, the measure of satisfaction over the ouput deals with quality (F-measure).
But it is also possible to tune a schema matcher to optimize time performance, for
instance with decision trees [ Duchateau et al. 2008a ].
In the next sections, we discuss the different parameters that one may face with
when using a schema matcher, based on these definitions.
À
z
D (s,p,d)and Match (s, z)
3
Input and Data Parameters
In this section, we gather the data and input parameters that one may have to con-
figure when using a schema matcher. We do not consider that the input schemas
belong to the tuning parameters. Indeed, a set of input schemas is compulsory to
run the matcher. Thus, this section is dedicated to parameters that may side along
with the input schemas (e.g., expert correspondences, data instances) or parameters
related to techniques used by the matcher (e.g., machine learning, external resources
used by a similarity measure). Indeed, most of these parameters directly affect the
quality or the time performance. Deciding whether to provide any of them, as well
as the choice of the parameters' values, is inherent to the tuning phase. The section is
organized according to the type of parameters. First, data parameters include expert
feedback. Such a reliable knowledge aims at improving the matching quality by
reusing entities that have been checked by a domain expert. This feedback, as well
as data instances, is often combined with machine learning techniques to exploit
them. These machine learning techniques hold various parameters to be efficient
and/or flexible, and we study them in the second part. The third category gathers
external resources, which mainly consist of providing an ontology or dictionary.
Finally, due to the complexity of the matching process and the design of numerous
matchers, there exist very specific parameters that one may only face by using a
given tool.
Search WWH ::




Custom Search