Databases Reference
In-Depth Information
define M i to be a random variable, representing the similarity measure of a randomly chosen match-
ing from i . is statistically monotonic if the following inequality holds for any 1 i<j n + 1:
M j
¯
¯
(M i ) <
¯
where
( M ) stands for the expected value of M .
Intuitively, a schema matching algorithm is statistically monotonic with respect to two given
schemata if the expected certainty increases with precision. Statistical monotonicity can help explain
certain phenomena in schema matching. For example, it can explain the lack of “industrial strength”
[ Bernstein et al. , 2004 ] schema matchers and serve as a guideline as we seek better ways to use schema
matchers. Also, it helps us understand why schema matcher ensembles work well (see Chapter 4 ).
Finally, it serves as a motivation for seeking top- K matchings (see Chapter 5 ).
There are instances where a matcher is considered monotonic for some aggregators but not for
others [ Gal et al. , 2005a ]. Consider, for example, the min operator. Consider further two attribute
sets,
{ A 1 ,A 2 }
{ A 1 ,A 2 }
and
, with the following attribute correspondence similarity matrix:
A 1
A 2
A 1
0 . 5 . 8
A 2
0 . 4
0 . 5
Let the exact matching be a matching such that A 1 is mapped with A 1 and A 2 with A 2 . Using
the average aggregator, the exact matching has a similarity of 0 . 5 while the best matching switches
the two correspondences to be A 1 ,A 2 , A 2 ,A 1 with a similarity measure of 0 . 6. Therefore,
the set of possible matchings is non-monotonic. However, by using the min operator, the schema
matching similarity of the exact matching (0 . 5) is higher than that of the best matching (0 . 4).
Another interesting observation [ Gal et al. , 2005a ] is that the use of an average aggregator
is preferred over any t-norm operator to compute matching similarity. To show this closely related
attributes are defined to be attributes that may map well in various combinations. Any pair of
attributes in a closely related attribute set has about the same similarity measure as any other pair,
making it hard for a matcher to differentiate them. The paper suggests that the use of the average
aggregator is more likely to yield monotonic matchings whenever attributes do not form closely
related sets. It is not true, however, that any other t-norm performs better if there are closely related
attribute sets.
Search WWH ::




Custom Search