Modeling Uncertain Schema Matching - Uncertain Schema Matching

Databases Reference

In-Depth Information

define M i to be a random variable, representing the similarity measure of a randomly chosen match-

ing from i . is statistically monotonic if the following inequality holds for any 1 ≤ i<j ≤ n + 1:

M j

¯

(M i ) <

¯

where

( M ) stands for the expected value of M .

Intuitively, a schema matching algorithm is statistically monotonic with respect to two given

schemata if the expected certainty increases with precision. Statistical monotonicity can help explain

certain phenomena in schema matching. For example, it can explain the lack of “industrial strength”

[ Bernstein et al. , 2004 ] schema matchers and serve as a guideline as we seek better ways to use schema

matchers. Also, it helps us understand why schema matcher ensembles work well (see Chapter 4 ).

Finally, it serves as a motivation for seeking top- K matchings (see Chapter 5 ).

There are instances where a matcher is considered monotonic for some aggregators but not for

others [ Gal et al. , 2005a ]. Consider, for example, the min operator. Consider further two attribute

sets,

{ A 1 ,A 2 }

and

, with the following attribute correspondence similarity matrix:

A 1

A 2

A 1

0 . 5 . 8

A 2

0 . 4

0 . 5

Let the exact matching be a matching such that A 1 is mapped with A 1 and A 2 with A 2 . Using

the average aggregator, the exact matching has a similarity of 0 . 5 while the best matching switches

the two correspondences to be A 1 ,A 2 , A 2 ,A 1 with a similarity measure of 0 . 6. Therefore,

the set of possible matchings is non-monotonic. However, by using the min operator, the schema

matching similarity of the exact matching (0 . 5) is higher than that of the best matching (0 . 4).

Another interesting observation [ Gal et al. , 2005a ] is that the use of an average aggregator

is preferred over any t-norm operator to compute matching similarity. To show this closely related

attributes are defined to be attributes that may map well in various combinations. Any pair of

attributes in a closely related attribute set has about the same similarity measure as any other pair,

making it hard for a matcher to differentiate them. The paper suggests that the use of the average

aggregator is more likely to yield monotonic matchings whenever attributes do not form closely

related sets. It is not true, however, that any other t-norm performs better if there are closely related

attribute sets.

Search WWH ::

Custom Search

Home