Enhancing the Capabilities of Attribute Correspondences - Schema Matching and Mapping

Databases Reference

In-Depth Information

steps of the S-Match algorithm. Steps 3 and 4 of S-Match generate a set of binary

matrices, where 1 represents relationship existence and 0 represents no relationship,

using some thresholds. During this process, and as part of a constraint enforcer, if

the same entry in two matrices is computed to be 1, a lattice of relationships strength

determines which values are to remain 1 and which will be lowered to 0.Asafinal

step, any entry for which a 0 value is recorded in all matrices, is assigned 1 for the idk

matrix. We observe that such modeling may be of much practical value, especially if

semantic matching is combined with quantity-based methods (e.g., based on string

matching) to create matcher ensembles.

5

Probabilistic Attribute Correspondences

There are many scenarios where a precise schema mapping may not be available.

For instance, a comparison search “bot” that tracks comparative prices from dif-

ferent web sites has - in real time - to determine which attributes at a particular

location correspond to which attributes in a database at another URL. In many cases,

users querying two databases belonging to different organizations may not know

what is the right schema mapping. The common model of attribute correspondences

assumes a unique and deterministic possible correspondence to each attribute and

thus incapable of modeling multiple possibilities.

Probabilistic attribute correspondences extend attribute correspondences by gen-

erating multiple possible models, modeling uncertainty about which one is correct

by using probability theory. Such probabilities can then be combined to represent

possible schema mappings, based on which query processing can be performed.

Example 3. For illustration purposes, consider the case study from Sect. 2 .Wenow

describe a scenario, which we dub semantic shift , according to which a relation in a

database, which was intended for one semantic use, changes its semantic role in the

organization database over the years. For example, the relation HotelCardInforma-

tion was initially designed to hold information of RoomsRUs credit cards. Over the

years, the hotel chain has outsourced the management of its credit cards to an exter-

nal company, and as a result, the differentiation between hotel credit cards and other

credit cards became vague, and new credit cards may be inserted in some arbitrary

way to the two relations CardInformation and HotelCardInformation .

Probabilistic attribute correspondences can state that R.CardInfo.cardNum

matches S.CardInformation.cardNum with a probability of 0.7 and S.HotelCard-

Information.clientNum with a probability of 0.3.

This robust model allows the provision, in the case of aggregate queries, not only

a ranking of the results, but also the expected value of the aggregate query outcome

and the distribution of possible aggregate values.

The model of probabilistic attribute correspondences is based on the model of

probabilistic schema mapping [ Dong et al. 2007 ], extending the concept of schema

Search WWH ::

Custom Search

Home