Modeling Uncertain Schema Matching - Uncertain Schema Matching

Databases Reference

In-Depth Information

of attribute similarity [ Berlin and Motro , 2001 , Doan et al. , 2001 ]. Finally, some researchers use

the experience of previous matchings as indicators of attribute similarity [ He and Chang , 2005 ,

Madhavan et al. , 2005 , Su et al. , 2006 ].

Example 3.3 To illustrate our model, and for the sake of completeness, we now present a few

examples of schema matchers, representative of many other, similar matchers. Detailed descriptions

of these matchers can be found in [ Gal et al. , 2005b ] and [ Marie and Gal , 2007a ]:

Term: Term matching compares attribute names to identify syntactically similar attributes. To

achieve better performance, names are preprocessed using several techniques originating in

IR research. Term matching is based on either complete words or string comparison. As an

example, consider the relations CardInfo and HotelCardInformation , which we refer to as

compound attributes herein. The maximum common substring is CardInfo , and the similarity

of the two terms is

length( CardInfo )

length( HotelCardInformation ) =

8

20 = 40%.

Value: Value matching utilizes domain constraints ( e.g. , drop lists, check boxes, and radio but-

tons). It becomes valuable when comparing two attributes whose names do not match ex-

actly. For example, consider attributes arrivalDate and checkInDay . These two attributes have

associated value sets

{ (Select),1,2,...,31

}

and

{ (Day),1,2,...,31

}

, respectively, and thus their

31

content-based similarity is

33 =

94%, which is significantly higher than their term similarity

2 ( Da )

(

11 ( arrivalDate ) = 18%).

Composition: A composite attribute is composed of other attributes (either atomic or composite).

Composition can be translated into a hierarchy. This schema matcher assigns similarity to

attributes based on the similarity of their neighbors. The Cupid matcher [ Madhavan et al. ,

2001 ], for example, is based on attribute composition.

Precedence: The order in which data are provided in an interactive process is important. In par-

ticular, data given at an earlier stage may restrict the options for a later entry. For example,

when filling in a form on a hotel reservation site, available room types can be determined using

the information given regarding location and check-in time. Once those entries are filled in,

the information is sent back to the server and the next form is brought up. Such precedence

relationships can usually be identified by the activation of a script, such as the one associated

with a SUBMIT button. Precedence relationships can be translated into a precedence graph.

The matching algorithm is based on a technique dubbed graph pivoting , as follows. When

matching two attributes, each is considered to be a pivot within its own schema, thus parti-

tioning the graph into a subgraph of all preceding and all succeeding attributes. By comparing

preceding subgraphs and succeeding subgraphs, the confidence strength of the pivot attributes

is determined. Precedence was used by Su [ 2007 ] to determine attribute correspondences with

a holistic matcher.

Uncertain Schema Matching

Search WWH ::

Custom Search

Home