Databases Reference
In-Depth Information
We can prove that under the expected-value semantics, the answer for the
SUM
operator under by-table and by-tuple semantics is the same; thus, query answer-
ing for
SUM
under the by-tuple and expected-value semantics is in PTIME.
For the
COUNT
operator, even query answering under the by-tuple semantics is
PTIME for the distribution semantics and thus also for other semantics. We next
illustrate this using an example.
For the rest of the combinations, we conjecture that query answering cannot
be finished in polynomial time and the complexity of query answering remains
open.
Example 8.
Continue with the running example (Fig.
4.2
) and consider the follow-
ing query.
Qc: SELECT COUNT(
*
) FROM S
WHERE mailing-addr = 'Sunnyvale'
Tab le
4.2
shows how the probability of each answer value changes after we pro-
cess each source tuple under the by-tuple semantics. After processing the first tuple,
the probability of
COUNT
D
0 is 0:5
C
0:1
D
0:6 (m
1
;m
3
) and that of
COUNT
D
1
D
0
is 0:4 (m
2
). After processing the second tuple, the probability of
COUNT
is the probability that
COUNT
D
0 after the first tuple times the probability of
applying m
3
to the second tuple, so 0:6
0:1
D
0:06.Thatof
COUNT
D
1 is
0:6
.0:5
C
0:4/
C
0:4
0:1
D
0:58, that is, either
COUNT
D
0 after we process the
first tuple and we apply m
1
or m
2
to the second tuple, or
COUNT
D
1 after the first
tuple and we apply m
3
to the third tuple. Similarly, the probability of
COUNT
D
2
is 0:4
.0:5
C
0:4/
D
0:36.
t
3.4
Creating p-Mappings
We now address the problem of generating a p-mapping between a source schema
and a target schema. We begin by assuming that we have a set of weighted corre-
spondences between the source attributes and the target attributes. These weighted
correspondences are created by a set of schema matching modules. However, as we
explain shortly, there can be
multiple
p-mappings that are consistent with a given
set of weighted correspondences, and the question is which of them to choose. We
describe an approach to creating p-mappings that is based on choosing the mapping
that maximizes the
entropy
of the probability assignment.
Tabl e 4. 2
Trace of query answering in Example
8
TupleID
COUNT
D
0
COUNT
D
1
COUNT
D
2
1
0:6
0:4
-
2
0:06
0:58
0.36