Databases Reference
In-Depth Information
The first observation is that top-K schema matchings play a pivotal role in iden-
tifying attribute correspondences. We have shown that top-K matchings can serve
in identifying both good candidates for contextual attribute correspondences and
probabilistic attribute correspondences. In this research direction, there are still
many open questions, first of which is the ability to identify top-K matchings in
polynomial time.
A model was proposed in Magnani et al. [ 2005 ] for combining semantic and
probabilistic attribute correspondences using constructs of uncertain semantic rela-
tionships in an ER model. An uncertain semantic relationship is a distribution of
beliefs over the set of all possible semantic relationships, using belief functions
[ Shafer 1976 ]. The set of possible semantic relations serve as the frame of discern-
ment (marked ), based on which two functions are defined, namely belief and
plausability . Both functions assign a value to a subset of the frame of discernment,
starting from the basic probability mass that is assigned with each element in the
frame of discernment. Belief of a set A 2 sums the probability mass of all subsets
B A. Plausability of a set A is the sum of all subsets that intersect with A, i.e., all
B such that A \ B
.
Combining semantic and probabilistic attribute correspondences (as proposed in
Magnani et al. [ 2005 ], for example) can be easily captured by the matrix abstrac-
tion. In Sect. 4.2 , we have outlined the way semantic attribute correspondences can
be captured using similarity matrices. When using the model proposed in Magnani
et al. [ 2005 ], the aggregator can be Dempster's combination rule [ Shafer 1976 ].
Consider now that entries in each such matrix are in Œ0; 1, reflecting probability
(or plausability) of this semantic attribute correspondence to hold. This will open
a new challenge of querying a database that uses probabilistic semantic attribute
correspondences. First, the notion of querying using semantic attribute correspon-
dences should be examined carefully. Then analysis, similar to the analysis done
in Dong et al. [ 2007 ]and Gal et al. [ 2009 ], where possible worlds semantics was
carefully defined for probabilistic schema mapping, can be extended to the case of
probabilistic semantic attribute correspondences. It is worth noting that the analysis
in Magnani et al. [ 2005 ] described a de-facto set of possible worlds, each world
represented by a different ER schema.
Contextual and by-tuple probabilistic attribute correspondences seem to be com-
plementary. A by-tuple probabilistic attribute correspondence represents a situation
in which there is uncertainty as to whether a given tuple should be interpreted
using one correspondence or the other. Contextual attribute correspondences models
exactly such knowledge. Therefore, By-tuple probabilistic attribute correspondence
is needed whenever no information regarding the contextual attribute correspon-
dence is available. Whenever contextual attribute correspondence is gathered auto-
matically, using statistical methods as described in Sect. 3.2 , another layer of uncer-
tainty is added to the modeling. Therefore, contextual attribute correspondences
should also be extended to provide probabilistic alternative versions.
¤ ¿
Acknowledgments I thank Wenfei Fan, Pavel Shvaiko, Luna Dong, and Tomer Sagi for useful
comments. The views and conclusions contained in this chapter are those of the author.
Search WWH ::




Custom Search