Databases Reference
In-Depth Information
f,F
on these matrices, yet are not dominated by
f,F
. For instance, let F be a weighted sum
as in Eq. 5.2 , and f be defined as:
j = 1 M j,σ(j) , j = 1 M j,σ(j) >t i
0 ,
f (i) (σ, M) =
,
(5.5)
otherwise
where t i > 0 is some predefined constant threshold. The intuition behind Eq. 5.5 is that matchers
which can no longer provide matchings with sufficient similarity (set as the threshold t i ) “quit” by
nullifying all further matchings. Another example, reflecting one of the settings used in schema
matching ( e.g. ,[ Bilke and Naumann , 2005 , Modica et al. , 2001 ]), is:
n
M j,σ(j) ·
δ M j,σ(j) >t j
f (i) (σ, M)
=
(5.6)
j = 1
where δ is the Kronecker discrete delta function. According to Eq. 5.6 , individual pair-wise attribute
matchings that do not pass a predefined, matcher-specific threshold are nullified. In both cases, it
is not hard to verify that f and F do not commute (in all but trivial cases of effectively redundant
thresholds.) On the other hand, functions h and H standing for simple sum and weighted sum (as
in Eq. 5.2 ) are (strongly) commutative, and we have
f,F
for both Eqs. 5.5 and 5.6 .For
such cases, we now present the Matrix-Direct-with-Bounding algorithm (or MDB , for short).
The MDB algorithm pseudocode is given as Algorithm 7 . Consider a schema metamatching
problem with aggregators
h, H
f,F
that do not commute on M ( 1 ) ,...,M (m) . The basic idea behind
the MDB algorithm is to use a pair of functions
f,F
h, H
(that both dominate
and commute on
f,F
M ( 1 ) ,...,M (m) ) as an upper bound for the “inconvenient”
of our actual interest. The MDB
algorithm behaves similarly to the MD algorithm if the latter is given with the aggregators
.
However, instead of reporting immediately on the generated matchings σ , the MDB algorithm uses
the decreasing aggregated weights of
h, H
h, H (σ ) to update the value of a threshold τ MDB . It is worth
noting that, due to commutativity of h and H (either strong or just with respect to M ( 1 ) ,...,M (m) ),
we have τ MDB = H,h (σ ) = h, H (σ ) . The threshold τ MDB
is used to judge our progress with
f,F
respect to the weights
that really matter. Theorem 5.14 shows that the MDB algorithm is
correct for any such upper bound
h, H
.
f,F
Theorem 5.14 Consider a set of m schema matchers with
being their local and global aggregators.
f,F
on M ( 1 ) ,...,M (m) , the MDB
Given a function pair
h, H
that both commute and dominate
f,F
algorithm correctly finds top-K valid matchings with respect to
.
Proof. Let Y be as in step 16 of the MDB algorithm. We need only show that every matching σ
Y
as every matching σ Y . By the definition of Y , this
is the case for each matching σ Y that has been seen by MDB . Thus, assume that σ was not seen.
f,F
has at least as high weight according to
Search WWH ::




Custom Search