Databases Reference
In-Depth Information
2
Preliminaries
For many systems, tuning is an important step to obtain expected results or to opti-
mize either matching quality or execution time. In the schema matching context,
this statement is easily checkable due to the large amount and the diversity of avail-
able parameters provided by schema matchers. We now formalize the problem of
tuning a schema matcher. As depicted by Fig. 10.2 , the schema matching process
requires as inputs at least two schemas, and optional parameters which can be given
a value . These values belong to a specific domain . We mainly distinguish three types
of domains:
A finite (multi-)valued set, e.g., a list of synonyms <( author , writer ), :::;( book ,
volume )>
An unordered discrete domain, e.g., mapping cardinality can be 1:1 , 1:n , n:1 ,or
n:m
An ordered continuous domain, e.g., a threshold for a similarity measure in the
range [0, 1]
Similarly to Lee et al. [ 2007 ], we call knob a parameter with an associated value.
However, we do not restrict knobs to have values from a finite valued set. Here are
examples of knobs: (mapping cardinality, 1:1) and (threshold trigrams , 0.15) .
More formally, we define S
D <s 1 , s 2 , :::; s n > the set of input schemas
that the user wants to match. The parameters are represented by the set P
D <p 1 ,
p 2 , :::; p k
>. The value domains are gathered in a set D
D <d 1 , d 2 , :::; d n
>
where each d i 2
D is a set <val u e i 1 , val u e i 2 , :::; val u e i t
>. Finally,
K
D <k 1 , k 2 , :::; k l > stands for the set of knobs or the configuration of a
schema matcher. With these definitions, we propose a Match function which uses
any schema matcher with a configuration k to match a set of schemas s . The output
of the Match function with the configuration k is a set of correspondences m k .
Match(s, k)
D m k
schema-1
schema-n
SCHEMA
Set of Matches
MATCHER
param-1 = value-1
param-k = value-k
Fig. 10.2
Inputs and outputs of the schema matching process
Search WWH ::




Custom Search