Databases Reference
In-Depth Information
steps of the S-Match algorithm. Steps 3 and 4 of S-Match generate a set of binary
matrices, where 1 represents relationship existence and 0 represents no relationship,
using some thresholds. During this process, and as part of a constraint enforcer, if
the same entry in two matrices is computed to be 1, a lattice of relationships strength
determines which values are to remain 1 and which will be lowered to 0.Asafinal
step, any entry for which a 0 value is recorded in all matrices, is assigned 1 for the idk
matrix. We observe that such modeling may be of much practical value, especially if
semantic matching is combined with quantity-based methods (e.g., based on string
matching) to create matcher ensembles.
5
Probabilistic Attribute Correspondences
There are many scenarios where a precise schema mapping may not be available.
For instance, a comparison search “bot” that tracks comparative prices from dif-
ferent web sites has - in real time - to determine which attributes at a particular
location correspond to which attributes in a database at another URL. In many cases,
users querying two databases belonging to different organizations may not know
what is the right schema mapping. The common model of attribute correspondences
assumes a unique and deterministic possible correspondence to each attribute and
thus incapable of modeling multiple possibilities.
Probabilistic attribute correspondences extend attribute correspondences by gen-
erating multiple possible models, modeling uncertainty about which one is correct
by using probability theory. Such probabilities can then be combined to represent
possible schema mappings, based on which query processing can be performed.
Example 3. For illustration purposes, consider the case study from Sect. 2 .Wenow
describe a scenario, which we dub semantic shift , according to which a relation in a
database, which was intended for one semantic use, changes its semantic role in the
organization database over the years. For example, the relation HotelCardInforma-
tion was initially designed to hold information of RoomsRUs credit cards. Over the
years, the hotel chain has outsourced the management of its credit cards to an exter-
nal company, and as a result, the differentiation between hotel credit cards and other
credit cards became vague, and new credit cards may be inserted in some arbitrary
way to the two relations CardInformation and HotelCardInformation .
Probabilistic attribute correspondences can state that R.CardInfo.cardNum
matches S.CardInformation.cardNum with a probability of 0.7 and S.HotelCard-
Information.clientNum with a probability of 0.3.
This robust model allows the provision, in the case of aggregate queries, not only
a ranking of the results, but also the expected value of the aggregate query outcome
and the distribution of possible aggregate values.
The model of probabilistic attribute correspondences is based on the model of
probabilistic schema mapping [ Dong et al. 2007 ], extending the concept of schema
Search WWH ::




Custom Search