Databases Reference
In-Depth Information
3.1.2 ATTRIBUTE CORRESPONDENCES AND THE SIMILARITY MATRIX
Let S and S be schemata with n and n attributes, respectively. 1
S = S × S
Let
be the set of all
possible attribute correspondences between S and S .
S
is a set of attribute pairs ( e.g. ,( arrivalDate ,
checkInDay )). Let M S,S be an n × n similarity matrix over
S
, where M i,j represents a degree of
similarity between the i -th attribute of S and the j -th attribute of S . The majority of works in the
schema matching literature define M i,j
to be a real number in ( 0 , 1 ) . M S,S is a binary similarity
n , M i,j
matrix if for all 1
i
n and 1
j
∈ {
0 , 1
}
. That is, a binary similarity matrix accepts
only 0 and 1 as possible values.
Table 3.2: A Similarity Matrix Example
S 1 −→
1 cardNum 2 city
3 arrivalDate
4 departureDate
S 2
1 clientNum
0 . 843
0 . 323
0 . 317
0 . 302
0 . 290
1 . 000
0 . 326
0 . 303
2 city
3 checkInDay
0 . 344
0 . 328
0 . 351
0 . 352
0 . 312
0 . 310
0 . 359
0 . 356
4 checkOutDay
Table 3.3: A Binary Similarity Matrix Example
S 1 −→
1 cardNum 2 city
3 arrivalDate
4 departureDate
S 2
1 clientNum
1
0
0
0
0
1
0
0
2 city
3 checkInDay
0
0
0
1
0
0
1
0
4 checkOutDay
Example 3.2 Consider tables 3.2 and 3.3 , representing simplified similarity matrices of the running
case study. The similarity matrix in Table 3.2 is a simplified version of the matching between two
schemata of Example 3.1 . The similarity matrix in Table 3.3 is a binary similarity matrix. Matrix
elements are given using both attribute names and numbers.
Similarity matrices are generated by schema matchers. Schema matchers are instantiations of
the schema matching process. They differ mainly in the measures of similarity they employ, which
yield different similarity matrices. These measures can be arbitrarily complex, and may use var-
ious techniques. Some matchers assume similar attributes are more likely to have similar names
[ He and Chang , 2003 , Su et al. , 2006 ]. Other matchers assume similar attributes share similar do-
mains [ Gal et al. , 2005b , Madhavan et al. , 2001 ]. Others yet take instance similarity as an indication
1 For ease of exposition, we constrain our presentation to a matching process involving two schemata. Extensions to holistic schema
matching are discussed in Section 3.2 .
 
Search WWH ::




Custom Search