Schema Matcher Ensembles - Uncertain Schema Matching

Databases Reference

In-Depth Information

M i,σ (i) , represents the similarity value of attribute i in S , and its matching counterpart, attribute

σ(i) ,in S . f (σ, M) is a function that aggregates the similarity measures associated with individual

attribute correspondences, forming a schema matching σ . A popular choice of a local aggregator

is the sum (or average) of attribute correspondence similarity measures ( e.g. ,[ Do and Rahm , 2002 ,

Gal et al. , 2005b , Melnik et al. , 2002 ]), but other local aggregators have been found appealing as

well. For example, the Dice local aggregator, suggested by Do and Rahm [ 2002 ], is the ratio of

the number of successfully matched attributes (those whose similarity measure has passed a given

threshold) and the total number of attributes in both schemata. Threshold-based aggregators have

been presented as well, e.g. ,by Modica et al. [ 2001 ]. f is typically assumed to be computable in

linear time in the matrix size. However, at least technically, there is no restriction on the use of more

sophisticated (and possibly more computation-intense) local aggregators.

Given two schemata S and S , an ensemble of m schema matchers may utilize differ-

ent local aggregators f ( 1 ) ,...,f (m) . Each local aggregator computes the similarity measure of

a matching of a different matchers and may be tied to the specific capabilities of the matcher.

For example, it may be more meaningful to apply an average aggregator than a min aggrega-

tor to a matcher that does not use a threshold. The m matchers produce an m × n × n similar-

ity cube of n × n similarity matrices M ( 1 ) ,...,M (m) . The similarity measures produced by such

an ensemble of schema matchers can be aggregated, using a real-valued global aggregation function

F f ( 1 ) (σ, M ( 1 ) ), ··· ,f (m) (σ, M (m) ) [ Do and Rahm , 2002 , Gal et al. , 2005b ].

f,F

denotes the

set of local and global aggregators, respectively. The aggregated weight provided by the m matchers

with

f,F

to the matching σ is given as

F f ( 1 ) (σ, M ( 1 ) ),

,f (m) (σ, M (m) )

f,F

(σ )

≡

···

Many global aggregators proposed in the literature can be generalized as

F f ( 1 ) (σ, M ( 1 ) ), ··· ,f (m) (σ, M (m) )

k l f (l) (σ, M (l) ),

(4.1)

l = 1

where Eq. 4.1 can be interpreted as a (weighted) sum (with λ = m ) or a (weighted) average (with

1) of the local similarity measures, and where k l are some arbitrary weighting parameters. It is

important to note that the choice of a global aggregator is ensemble-dependent, and it is considered

to be a given property of the ensemble.

This model represents just one possible ensemble design, a linear parallel multiple-matcher

design model. We now extend this model in three different dimensions, to demonstrate the ensemble

design space. The first two dimensions are illustrated in Table 4.1 , with representative examples for

each design decision in the space.

Participation dimension: Determining the participating schema matchers in an ensemble is an

important tuning parameter of the matching process. In Section 4.3 , we provide a method for

matcher selection. Works in the literature typically construct matcher ensembles from multiple

Uncertain Schema Matching

Search WWH ::

Custom Search

Home