Databases Reference
In-Depth Information
p
name
e
mail-addr
c
urrent-addr
p
ermanent-addr
Alice
alice@
Mountain View
Sunnyvale
Bob
bob@
Sunnyvale
Sunnyvale
(a)
n
ame
e
mail
m
ailing-addr
h
ome-addr
o
ffice-addr
Alice alice@ Mountain View Sunnyvale
office
Bob
bob@
Sunnyvale
Sunnyvale
office
(b)
n
ame
e
mail
m
ailing-addr
h
ome-addr
o
ffice-addr
Alice alice@
Sunnyvale
Mountain View
office
Bob
email
bob@
Sunnyvale
office
(c)
Tuple (mailing-addr) Prob
('Sunnyvale')
Tuple (mailing-addr) Prob
('Sunnyvale')
0.9
0.94
('Mountain View')
0.5
('Mountain View')
0.5
('alice@')
0.1
('alice@')
0.1
('bob@')
0.1
('bob@')
0.1
(d)
(e)
Fig. 4.3
Example
3
:(
a
) a source instance D
S
; (b) a target instance that is by-table consistent
with D
S
and m
1
;(
c
) a target instance that is by-tuple consistent with D
S
and <m
2
;m
3
>;(
d
)
Q
table
.D
S
/;(
e
) Q
tuple
.D
S
/
For i
2
Œ1;l, m
i
is a one-to-one mapping between S and T , and for every
i;j
2
Œ1;l, i
¤
j
)
m
i
¤
m
j
.
Pr
.m
i
/
2
Œ0;1 and
P
i
D
1
Pr
.m
i
/
D
1.
A
schema p-mapping
, pM, is a set of p-mappings between relations in
S and in
T , where every relation in either
S or
T appears in at most one p-mapping.
t
We refer to a nonprobabilistic mapping as an
ordinary mapping
.Aschema
p-mapping may contain both p-mappings and ordinary mappings. Example
1
shows
a p-mapping (see Fig.
4.2
a) that contains three possible mappings.
3.2.2
Semantics of Probabilistic Mappings
Intuitively, a probabilistic schema mapping models the uncertainty about which of
the mappings in pM is the correct one. When a schema matching system produces
a set of candidate matches, there are two ways to interpret the uncertainty: (1) a
single mapping in pM is the correct one, and it applies to all the data in S,or(2)
several mappings are partially correct, and each is suitable for a subset of tuples
in S, though it is not known which mapping is the right one for a specific tuple.
Figure
4.3
b illustrates the first interpretation and applies mapping m
1
.Forthesame
example, the second interpretation is equally valid: some people may choose to use
their current address as mailing address, while others use their permanent address