Enhancing the Capabilities of Attribute Correspondences - Schema Matching and Mapping

Databases Reference

In-Depth Information

The first observation is that top-K schema matchings play a pivotal role in iden-

tifying attribute correspondences. We have shown that top-K matchings can serve

in identifying both good candidates for contextual attribute correspondences and

probabilistic attribute correspondences. In this research direction, there are still

many open questions, first of which is the ability to identify top-K matchings in

polynomial time.

A model was proposed in Magnani et al. [ 2005 ] for combining semantic and

probabilistic attribute correspondences using constructs of uncertain semantic rela-

tionships in an ER model. An uncertain semantic relationship is a distribution of

beliefs over the set of all possible semantic relationships, using belief functions

[ Shafer 1976 ]. The set of possible semantic relations serve as the frame of discern-

ment (marked ), based on which two functions are defined, namely belief and

plausability . Both functions assign a value to a subset of the frame of discernment,

starting from the basic probability mass that is assigned with each element in the

frame of discernment. Belief of a set A 2 sums the probability mass of all subsets

B A. Plausability of a set A is the sum of all subsets that intersect with A, i.e., all

B such that A \ B

.

Combining semantic and probabilistic attribute correspondences (as proposed in

Magnani et al. [ 2005 ], for example) can be easily captured by the matrix abstrac-

tion. In Sect. 4.2 , we have outlined the way semantic attribute correspondences can

be captured using similarity matrices. When using the model proposed in Magnani

et al. [ 2005 ], the aggregator can be Dempster's combination rule [ Shafer 1976 ].

Consider now that entries in each such matrix are in Œ0; 1, reflecting probability

(or plausability) of this semantic attribute correspondence to hold. This will open

a new challenge of querying a database that uses probabilistic semantic attribute

correspondences. First, the notion of querying using semantic attribute correspon-

dences should be examined carefully. Then analysis, similar to the analysis done

in Dong et al. [ 2007 ]and Gal et al. [ 2009 ], where possible worlds semantics was

carefully defined for probabilistic schema mapping, can be extended to the case of

probabilistic semantic attribute correspondences. It is worth noting that the analysis

in Magnani et al. [ 2005 ] described a de-facto set of possible worlds, each world

represented by a different ER schema.

Contextual and by-tuple probabilistic attribute correspondences seem to be com-

plementary. A by-tuple probabilistic attribute correspondence represents a situation

in which there is uncertainty as to whether a given tuple should be interpreted

using one correspondence or the other. Contextual attribute correspondences models

exactly such knowledge. Therefore, By-tuple probabilistic attribute correspondence

is needed whenever no information regarding the contextual attribute correspon-

dence is available. Whenever contextual attribute correspondence is gathered auto-

matically, using statistical methods as described in Sect. 3.2 , another layer of uncer-

tainty is added to the modeling. Therefore, contextual attribute correspondences

should also be extended to provide probabilistic alternative versions.

¤ ¿

Acknowledgments I thank Wenfei Fan, Pavel Shvaiko, Luna Dong, and Tomer Sagi for useful

comments. The views and conclusions contained in this chapter are those of the author.

Schema Matching and Mapping

Search WWH ::

Custom Search

Home