Databases Reference
In-Depth Information
2
Preliminaries
For many systems, tuning is an important step to obtain expected results or to opti-
mize either matching quality or execution time. In the schema matching context,
this statement is easily checkable due to the large amount and the diversity of avail-
able parameters provided by schema matchers. We now formalize the problem of
tuning a schema matcher. As depicted by Fig.
10.2
, the schema matching process
requires as inputs at least two schemas, and optional
parameters
which can be given
a
value
. These values belong to a specific
domain
. We mainly distinguish three types
of domains:
A finite (multi-)valued set, e.g., a list of synonyms <(
author
,
writer
), :::;(
book
,
volume
)>
An unordered discrete domain, e.g., mapping cardinality can be
1:1
,
1:n
,
n:1
,or
n:m
An ordered continuous domain, e.g., a threshold for a similarity measure in the
range [0, 1]
Similarly to
Lee et al.
[
2007
], we call
knob
a parameter with an associated value.
However, we do not restrict knobs to have values from a finite valued set. Here are
examples of knobs:
(mapping cardinality, 1:1)
and
(threshold
trigrams
, 0.15)
.
More formally, we define
S
D
<s
1
, s
2
, :::; s
n
> the set of input schemas
that the user wants to match. The parameters are represented by the set
P
D
<p
1
,
p
2
, :::; p
k
>. The value domains are gathered in a set
D
D
<d
1
, d
2
, :::; d
n
>
where each d
i
2
D is a set <val
u
e
i
1
, val
u
e
i
2
, :::; val
u
e
i
t
>. Finally,
K
D
<k
1
, k
2
, :::; k
l
> stands for the set of knobs or the configuration of a
schema matcher. With these definitions, we propose a
Match
function which uses
any schema matcher with a configuration
k
to match a set of schemas
s
. The output
of the
Match
function with the configuration
k
is a set of correspondences m
k
.
Match(s, k)
D
m
k
schema-1
schema-n
SCHEMA
Set of Matches
MATCHER
param-1 = value-1
param-k = value-k
Fig. 10.2
Inputs and outputs of the schema matching process