Databases Reference
In-Depth Information
3.4.1
Computing Weighted Correspondences
A
weighted correspondence
between a pair of attributes specifies the degree of
semantic similarity between them. Let S.s
1
;:::;s
m
/ be a source schema and
T.t
1
;:::;t
n
/ be a target schema. We denote by C
i;j
;i
2
Œ1;m;j
2
Œ1;n; the
weighted correspondence between s
i
and t
j
and by
w
i;j
the weight of C
i;j
.The
first step is to compute a weighted correspondence between every pair of attributes,
which can be done by applying existing schema-matching techniques.
Although weighted correspondences tell us the degree of similarity between pairs
of attributes, they do not tell us
which
target attribute a source attribute should map
to. For example, a target attribute
mailing-address
can be both similar to the source
attribute
current-addr
and to
permanent-addr
, so it makes sense to map either of
them to
mailing-address
in a schema mapping. In fact, given a set of weighted
correspondences, there could be a
set
of p-mappings that are consistent with it. We
can define the one-to-many relationship between sets of weighted correspondences
and p-mappings by specifying when a p-mapping is
consistent with
a set of weighted
correspondences.
Definition 10
(Consistent p-mapping).
A p-mapping pM is
consistent with
a
weighted correspondence C
i;j
between a pair of source and target attributes if the
sum of the probabilities of all mappings m
2
pM containing correspondence .i;j/
equals
w
i;j
;thatis,
X
w
i;j
D
Pr.m/:
m
2
pM;.i;j /
2
m
A p-mapping is
consistent with
a set of weighted correspondences
C
if it is
consistent with each weighted correspondence C
2
t
However, not every set of weighted correspondences admits a consistent p-mapping.
The following theorem shows under which conditions a consistent p-mapping exists,
and it establishes a normalization factor for weighted correspondences that will
guarantee the existence of a consistent p-mapping.
Theorem 3.
Let
C
be a set of weighted correspondences between a source schema
S.s
1
;:::;s
m
/
and a target schema
T.t
1
;:::;t
n
/
.
C
.
There exists a consistent p-mapping with respect to
C
if and only if (1) for every
i
2
Œ1;m
,
P
j
D
1
w
i;j
1
and (2) for every
j
2
Œ1;n
,
P
i
D
1
w
i;j
1
.
Let
n
X
X
M
0
D
max
f
max
i
f
w
i;j
g
;max
j
f
w
i;j
gg
:
j
D
1
i
D
1
Then, for each
i
2
Œ1;m
,
P
j
D
1
1
and for each
j
2
Œ1;n
,
P
i
D
1
w
i;j
M
0
1
.
t
Based on Theorem
3
, we normalize the weighted correspondences we generated
as described previously by dividing them by M
0
,thatis,
w
i;j
M
0
w
i;j
M
0
:
w
i;j
D