Top-K Schema Matchings - Uncertain Schema Matching

Databases Reference

In-Depth Information

f,F

on these matrices, yet are not dominated by

f,F

. For instance, let F be a weighted sum

as in Eq. 5.2 , and f be defined as:

j = 1 M j,σ(j) , j = 1 M j,σ(j) >t i

0 ,

f (i) (σ, M) =

(5.5)

otherwise

where t i > 0 is some predefined constant threshold. The intuition behind Eq. 5.5 is that matchers

which can no longer provide matchings with sufficient similarity (set as the threshold t i ) “quit” by

nullifying all further matchings. Another example, reflecting one of the settings used in schema

matching ( e.g. ,[ Bilke and Naumann , 2005 , Modica et al. , 2001 ]), is:

M j,σ(j) ·

δ M j,σ(j) >t j

f (i) (σ, M)

(5.6)

j = 1

where δ is the Kronecker discrete delta function. According to Eq. 5.6 , individual pair-wise attribute

matchings that do not pass a predefined, matcher-specific threshold are nullified. In both cases, it

is not hard to verify that f and F do not commute (in all but trivial cases of effectively redundant

thresholds.) On the other hand, functions h and H standing for simple sum and weighted sum (as

in Eq. 5.2 ) are (strongly) commutative, and we have

f,F

for both Eqs. 5.5 and 5.6 .For

such cases, we now present the Matrix-Direct-with-Bounding algorithm (or MDB , for short).

The MDB algorithm pseudocode is given as Algorithm 7 . Consider a schema metamatching

problem with aggregators

≺

h, H

f,F

that do not commute on M ( 1 ) ,...,M (m) . The basic idea behind

the MDB algorithm is to use a pair of functions

f,F

h, H

(that both dominate

and commute on

f,F

M ( 1 ) ,...,M (m) ) as an upper bound for the “inconvenient”

of our actual interest. The MDB

algorithm behaves similarly to the MD algorithm if the latter is given with the aggregators

However, instead of reporting immediately on the generated matchings σ , the MDB algorithm uses

the decreasing aggregated weights of

h, H

h, H (σ ) to update the value of a threshold τ MDB . It is worth

noting that, due to commutativity of h and H (either strong or just with respect to M ( 1 ) ,...,M (m) ),

we have τ MDB = H,h (σ ) = h, H (σ ) . The threshold τ MDB

is used to judge our progress with

f,F

respect to the weights

that really matter. Theorem 5.14 shows that the MDB algorithm is

correct for any such upper bound

h, H

f,F

Theorem 5.14 Consider a set of m schema matchers with

being their local and global aggregators.

f,F

on M ( 1 ) ,...,M (m) , the MDB

Given a function pair

h, H

that both commute and dominate

f,F

algorithm correctly finds top-K valid matchings with respect to

Proof. Let Y be as in step 16 of the MDB algorithm. We need only show that every matching σ

∈ Y

as every matching σ ∈ Y . By the definition of Y , this

is the case for each matching σ ∈ Y that has been seen by MDB . Thus, assume that σ was not seen.

f,F

has at least as high weight according to

Uncertain Schema Matching

Search WWH ::

Custom Search

Home