Databases Reference
In-Depth Information
A monotonic matching algorithm easily identifies the exact matching. Let
σ
∗
be the exact
matching, then
P(σ
∗
)
=
1. For any other matching
σ
,
P(σ
)<P(σ
∗
)
. Therefore, if
P(σ
)<
P(σ
∗
)
, then from monotonicity,
(σ
)<(σ
∗
)
. All one has to do then is to devise a method for
finding a matching
σ
∗
that maximizes
.
Number
of
occurrences
Figure 3.2:
Illustration of the monotonicity principle
Figure
3.2
provides an illustration of the monotonicity principle using a matching of a simpli-
fied version of two Web forms. Both schemata have nine attributes, all of which are matched under
the exact matching. Given a set of matchings, each value on the x-axis represents a class of schema
matchings with a different precision. The z-axis represents the similarity measure. Finally, the y-axis
stands for the number of schema matchings from a given precision class and with a given similarity
measure.
Two main insights are readily noticeable from Figure
3.2
. First, the similarity measures of
matchings within each schema matching class form a “bell” shape, centered around a specific sim-
ilarity measure. This behavior indicates a certain level of robustness, where the schema matcher
assigns similar similarity measures to matchings within each class. Second, the “tails” of the bell
shapes overlap. Therefore, a schema matching from a class with lower precision may receive a higher
similarity measure than one from a class with higher precision. This, of course, contradicts the defini-
tion of monotonicity. However, the first observation serves as motivation for a definition of statistical
monotonicity [
Gal et al.
,
2005a
], as follows:
be a set of matchings over
schemata
S
1
and
S
2
with
n
1
and
n
2
attributes, respectively, and define
n
=
max
(n
1
,n
2
)
.Let
1
,
2
, ...,
n
+
1
be subsets of
such that for all 1
Statistical monotonicity
Let
={
σ
1
,σ
2
, ..., σ
m
}
Definition 3.20
i
−
1
i
≤
i
≤
n
+
1
,σ
∈
i
iff
≤
P (σ ) <
n
.We
n