Databases Reference
In-Depth Information
of attribute similarity [ Berlin and Motro , 2001 , Doan et al. , 2001 ]. Finally, some researchers use
the experience of previous matchings as indicators of attribute similarity [ He and Chang , 2005 ,
Madhavan et al. , 2005 , Su et al. , 2006 ].
Example 3.3 To illustrate our model, and for the sake of completeness, we now present a few
examples of schema matchers, representative of many other, similar matchers. Detailed descriptions
of these matchers can be found in [ Gal et al. , 2005b ] and [ Marie and Gal , 2007a ]:
Term: Term matching compares attribute names to identify syntactically similar attributes. To
achieve better performance, names are preprocessed using several techniques originating in
IR research. Term matching is based on either complete words or string comparison. As an
example, consider the relations CardInfo and HotelCardInformation , which we refer to as
compound attributes herein. The maximum common substring is CardInfo , and the similarity
of the two terms is
length( CardInfo )
length( HotelCardInformation ) =
8
20 = 40%.
Value: Value matching utilizes domain constraints ( e.g. , drop lists, check boxes, and radio but-
tons). It becomes valuable when comparing two attributes whose names do not match ex-
actly. For example, consider attributes arrivalDate and checkInDay . These two attributes have
associated value sets
{ (Select),1,2,...,31
}
and
{ (Day),1,2,...,31
}
, respectively, and thus their
31
content-based similarity is
33 =
94%, which is significantly higher than their term similarity
2 ( Da )
(
11 ( arrivalDate ) = 18%).
Composition: A composite attribute is composed of other attributes (either atomic or composite).
Composition can be translated into a hierarchy. This schema matcher assigns similarity to
attributes based on the similarity of their neighbors. The Cupid matcher [ Madhavan et al. ,
2001 ], for example, is based on attribute composition.
Precedence: The order in which data are provided in an interactive process is important. In par-
ticular, data given at an earlier stage may restrict the options for a later entry. For example,
when filling in a form on a hotel reservation site, available room types can be determined using
the information given regarding location and check-in time. Once those entries are filled in,
the information is sent back to the server and the next form is brought up. Such precedence
relationships can usually be identified by the activation of a script, such as the one associated
with a SUBMIT button. Precedence relationships can be translated into a precedence graph.
The matching algorithm is based on a technique dubbed graph pivoting , as follows. When
matching two attributes, each is considered to be a pivot within its own schema, thus parti-
tioning the graph into a subgraph of all preceding and all succeeding attributes. By comparing
preceding subgraphs and succeeding subgraphs, the confidence strength of the pivot attributes
is determined. Precedence was used by Su [ 2007 ] to determine attribute correspondences with
a holistic matcher.
 
Search WWH ::




Custom Search