Enhancing the Capabilities of Attribute Correspondences - Schema Matching and Mapping

Databases Reference

In-Depth Information

3.2

Finding Contextual Attribute Correspondences

A few challenges arise when designing an algorithm for finding contextual attribute

correspondences. First, one may risk overfitting the correspondences to the train-

ing data. For example, it is possible that one could find a contextual attribute

correspondence stating

. R : CardInfo : expiryMonth ; S : HotelCardInformation : expiryMonth ; R : CardInfo : securityCode > 333 / ;

which is clearly inappropriate, since the security code is associated with the card

number and not with its expiry. A naıve classifier may fall into this trap simply by

some bias in the training dataset that assigns more cards with higher values of the

securityCode attribute.

A second challenge involves situations in which the contextual attribute corre-

spondences are not specializations of (noncontextual) attribute correspondences and

therefore, cannot be identified as refinements of the outcome of existing matchers.

As an example, consider our case study application. R.HotelInfo.neighborhood

provides neighborhood information for medium-size cities. However, for bigger

cities, it prefers a more accurate positioning of the hotel, using subway station

names as the neighborhood information. Therefore, a possible contextual attribute

correspondence may be

. R : HotelInfo : neighborhood ; T : Subway : station ; R : HotelInfo : city

D

' Moscow '/:

However, this is not a refinement of an attribute correspondence . R : HotelInfo :

neighborhood ; T : Subway : station /.

An approach for discovering contextual matches was introduced in Bohannon

et al. [ 2006 ]. Let M i;j be the score of matching attributes S:A i with S:A j .Given

a condition c, a matcher can use the subset of the instance problem that satisfies c

to provide a new score M i;j

. The difference M i;j M i;j is the improvement of

the contextual attribute correspondence. Given the set of conditions C , we can cre-

ate a contextual attribute correspondence using the condition c that maximizes

the improvement measure. Using an improvement threshold can solve the overfit-

ting challenge. However, thresholds are always tricky. A threshold that is set too

low introduces false positives while a threshold that is too high may introduce false

negatives. Using machine learning techniques to tune thresholds has proven to be

effective in schema matching [ Lee et al. 2007 ]. However, as was shown in Gal

[ 2006 ], it is impossible to set thresholds that will avoid this false negative/false

positive trade-off.

It has been proposed in Bohannon et al. [ 2006 ]thatk-contexts with k>1

will yield more trustworthy contextual attribute correspondences. The algorithm

first determines an initial list of 1-context conditions. Then, it creates and evalu-

ates disjunctive conditions that are generated from the original 1-context conditions.

The generation of conditions is carried out using view selection. Views are chosen

Schema Matching and Mapping

Search WWH ::

Custom Search

Home