Databases Reference
In-Depth Information
M i,σ (i) , represents the similarity value of attribute i in S , and its matching counterpart, attribute
σ(i) ,in S . f (σ, M) is a function that aggregates the similarity measures associated with individual
attribute correspondences, forming a schema matching σ . A popular choice of a local aggregator
is the sum (or average) of attribute correspondence similarity measures ( e.g. ,[ Do and Rahm , 2002 ,
Gal et al. , 2005b , Melnik et al. , 2002 ]), but other local aggregators have been found appealing as
well. For example, the Dice local aggregator, suggested by Do and Rahm [ 2002 ], is the ratio of
the number of successfully matched attributes (those whose similarity measure has passed a given
threshold) and the total number of attributes in both schemata. Threshold-based aggregators have
been presented as well, e.g. ,by Modica et al. [ 2001 ]. f is typically assumed to be computable in
linear time in the matrix size. However, at least technically, there is no restriction on the use of more
sophisticated (and possibly more computation-intense) local aggregators.
Given two schemata S and S , an ensemble of m schema matchers may utilize differ-
ent local aggregators f ( 1 ) ,...,f (m) . Each local aggregator computes the similarity measure of
a matching of a different matchers and may be tied to the specific capabilities of the matcher.
For example, it may be more meaningful to apply an average aggregator than a min aggrega-
tor to a matcher that does not use a threshold. The m matchers produce an m × n × n similar-
ity cube of n × n similarity matrices M ( 1 ) ,...,M (m) . The similarity measures produced by such
an ensemble of schema matchers can be aggregated, using a real-valued global aggregation function
F f ( 1 ) (σ, M ( 1 ) ), ··· ,f (m) (σ, M (m) ) [ Do and Rahm , 2002 , Gal et al. , 2005b ].
f,F
denotes the
set of local and global aggregators, respectively. The aggregated weight provided by the m matchers
with
f,F
to the matching σ is given as
F f ( 1 ) (σ, M ( 1 ) ),
,f (m) (σ, M (m) )
f,F
(σ )
···
Many global aggregators proposed in the literature can be generalized as
F f ( 1 ) (σ, M ( 1 ) ), ··· ,f (m) (σ, M (m) )
m
λ
m
k l f (l) (σ, M (l) ),
=
(4.1)
l = 1
where Eq. 4.1 can be interpreted as a (weighted) sum (with λ = m ) or a (weighted) average (with
λ
1) of the local similarity measures, and where k l are some arbitrary weighting parameters. It is
important to note that the choice of a global aggregator is ensemble-dependent, and it is considered
to be a given property of the ensemble.
This model represents just one possible ensemble design, a linear parallel multiple-matcher
design model. We now extend this model in three different dimensions, to demonstrate the ensemble
design space. The first two dimensions are illustrated in Table 4.1 , with representative examples for
each design decision in the space.
=
Participation dimension: Determining the participating schema matchers in an ensemble is an
important tuning parameter of the matching process. In Section 4.3 , we provide a method for
matcher selection. Works in the literature typically construct matcher ensembles from multiple
Search WWH ::




Custom Search