Enhancing the Capabilities of Attribute Correspondences - Schema Matching and Mapping

Databases Reference

In-Depth Information

. R : CardInfo : cardNum ; S : HotelCardInfo : clientNum ; R : CardInfo : type

' RoomsRUs '/

. R : CardInfo : cardNum ; S : CardInfo : cardNum ; R : CardInfo : type ¤ ' RoomsRUs '/

Contextual attribute correspondences are useful in overcoming various aspects

of structural heterogeneity. A typical example of such heterogeneity involves

designer's decision regarding the interpretation of subtypes. In the example above,

database R was designed to include all credit card subtypes in a single relation,

with type as a differentiating value. Database S refines this decision by allocating a

separate relation to one of the subtypes.

In Bohannon et al. [ 2006 ], a selection condition is defined as a logical condi-

tion, with the added benefit of serving as a basis for the schema mapping process

[ Barbosa et al. 2005 ; Bohannon et al. 2005 ; Fagin 2006 ; Fagin et al. 2007 ].

At the basis of contextual attribute correspondences is the use of instance values

as a differentiator between possible correspondences. Therefore, the ability of iden-

tifying contextual attribute correspondences depends on the ability of a matcher to

take into account instance values. For example, the Te r m matching technique, given

earlier as an example, will not change its estimation of the amount of similarity of

two attributes based on context. Instance values are used in many of the methods

that apply machine learning techniques to schema matching. Autoplex [ Berlin and

Motro 2001 ], LSD [ Doan et al. 2001 ], and iMAP [ Dhamankar et al. 2004 ]usea

naıve Bayes classifier to learn attribute correspondence probabilities using instance

training set. Also, sPLMap [ Nottelmann and Straccia 2007 ]usenaıve Bayes, kNN,

and KL-distance as content-based classifiers.

3.1

Modeling Contextual Attribute Correspondences

Contextual attribute correspondences are specified in terms of a condition on the

value assignments of attributes. A k -context of an attribute correspondence is a

condition that involves k database attributes. For k D 0, a contextual attribute corre-

spondence becomes a common attribute correspondence. For k D 1, the condition is

simple, of the form a D v,wherea is an attribute and v is a constant in a's domain.

For example, R.CardInfo.type='RoomsRUs' . Disjunctive, conjunctive, and gen-

eral k-contexts generalize simple conditions in the usual way. For example, simple

disjunctive k -context for k D 1 is a condition of the form a 2f v 1 ;v 2 ;:::;v k g

Contextual attribute correspondences can be modeled with similarity matrices.

An entry in the similarity matrix M i;j

,where

v 2 Œ0; 1 is a similarity value and c is a context as defined above. This model-

ing allows a smooth extension of contextual attribute correspondences to matcher

ensembles [ Domshlak et al. 2007 ; He and Chang 2005 ], in which matchers are

combined to improve the quality of the outcome of the matching process. For exam-

ple, Do et al. [ 2002 ]and Domshlak et al. [ 2007 ] proposed several ways to combine

similarity matrices, generated by different matchers, into a single matrix. Such com-

bination, which was based solely on aggregating similarity scores, can be extended

is extended to be a tuple

h v; c i

Schema Matching and Mapping

Search WWH ::

Custom Search

Home