Databases Reference
In-Depth Information
Proof.
The correctness is immediately discernible from the construction of the
MD
algorithm and
Definition
5.10
.As
F
is assumed to be computable in time linear in the number of
F
's parameters,
generating
M
∗
takes time
O(
max
(n, n
)
2
m)
. Thus, the overall complexity of the
MD
algorithm is
O(
max
(n, n
)
2
m
+
)
.
5.3.2 MATRIX-DIRECT ALGORITHM WITH BOUNDING
We now extend the
MD
algorithm to deal with non-commutative functions.
Consider a set of similarity matrices
M
(
1
)
,...,M
(m)
over a pair of schemas
S
,
Definition 5.12
f,F
f
,F
f
,F
S
, and two sets of local and global aggregators
and
. We say that
dominates
f,F
f
,F
f,F
) if, for every matching
σ
from
S
to
S
,
on M
(
1
)
,...,M
(m)
(denoted as
we have:
f
,F
≥
f,F
(σ )
(σ ).
(5.4)
f
,F
Likewise, if Eq.
5.4
holds for all possible sets of similarity matrices, then we say that
strongly
f,F
dominates
.
Consider
a
schema
metamatching
problem
defined
by
a
set
of
similarity
matrices
f,F
M
(
1
)
,...,M
(m)
and a set of local and global aggregators
that
do not
commute on
M
(
1
)
,...,M
(m)
. Suppose that there exists a pair of functions
h, H
that (i)
do
commute on
f,F
M
(
1
)
,...,M
(m)
, and (ii) dominate
on these matrices. Corollary
5.13
, which follows im-
mediately from the definition of the
MD
algorithm, defines a simple property of this algorithm that
provides some intuition for the subsequent construction steps.
Corollary 5.13
Given a set of schema matchers and a pair of local and global aggregators
h, H
commuting on M
(
1
)
,...,M
(m)
, the top-K result of the
MD
algorithm with respect to
h, H
is a correct
f,F
top-K aggregation with respect to any set of local and global aggregators
, such that both
h, H
f,F
f,F
h, H
hold on M
(
1
)
,...,M
(m)
.
and
In general, nothing prevents Corollary
5.13
from being realized. To illustrate, consider the fol-
lowing set of four real-valued functions:
f(x)
x
2
, F (x)
x
2
/
2
, H (x)
x
. While
f
and
F
do not commute on reals (
F(f(x))
=
x
2
/
2 and
f(F(x))
=
x
2
/
4), the functions
h
and
H
are strongly commutative (
H (h(x))
=
=
x/
2
, h(x)
=
=
x
2
/
2), and we have
H (h(x))
F(f(x))
. Corol-
lary
5.13
provides us with some useful intuitions. Consider a schema metamatching problem defined
by a set of similarity matrices
M
(
1
)
,...,M
(m)
=
h(H (x))
=
=
f,F
that
do not
commute on
M
(
1
)
,...,M
(m)
. It is worth noting that we allow different matchers to use different
local aggregators and in such cases,
and local and global aggregators
f,F
is (trivially) defined not to commute on
M
(
1
)
,...,M
(m)
.
that
do
commute on
M
(
1
)
,...,M
(m)
Suppose there exists a pair of functions
h, H
and dominate