Databases Reference
In-Depth Information
We can prove that under the expected-value semantics, the answer for the SUM
operator under by-table and by-tuple semantics is the same; thus, query answer-
ing for SUM under the by-tuple and expected-value semantics is in PTIME.
For the COUNT operator, even query answering under the by-tuple semantics is
PTIME for the distribution semantics and thus also for other semantics. We next
illustrate this using an example.
For the rest of the combinations, we conjecture that query answering cannot
be finished in polynomial time and the complexity of query answering remains
open.
Example 8. Continue with the running example (Fig. 4.2 ) and consider the follow-
ing query.
Qc: SELECT COUNT( * ) FROM S
WHERE mailing-addr = 'Sunnyvale'
Tab le 4.2 shows how the probability of each answer value changes after we pro-
cess each source tuple under the by-tuple semantics. After processing the first tuple,
the probability of COUNT
D 0 is 0:5 C 0:1 D 0:6 (m 1 ;m 3 ) and that of COUNT
D 1
D 0
is 0:4 (m 2 ). After processing the second tuple, the probability of COUNT
is the probability that COUNT
D 0 after the first tuple times the probability of
applying m 3
to the second tuple, so 0:6 0:1 D 0:06.Thatof COUNT
D 1 is
0:6 .0:5 C 0:4/ C 0:4 0:1 D 0:58, that is, either COUNT
D 0 after we process the
first tuple and we apply m 1 or m 2 to the second tuple, or COUNT
D 1 after the first
tuple and we apply m 3
to the third tuple. Similarly, the probability of COUNT
D 2
is 0:4 .0:5 C 0:4/ D 0:36.
t
3.4
Creating p-Mappings
We now address the problem of generating a p-mapping between a source schema
and a target schema. We begin by assuming that we have a set of weighted corre-
spondences between the source attributes and the target attributes. These weighted
correspondences are created by a set of schema matching modules. However, as we
explain shortly, there can be multiple p-mappings that are consistent with a given
set of weighted correspondences, and the question is which of them to choose. We
describe an approach to creating p-mappings that is based on choosing the mapping
that maximizes the entropy of the probability assignment.
Tabl e 4. 2
Trace of query answering in Example 8
TupleID
COUNT
D 0
COUNT
D 1
COUNT
D 2
1
0:6
0:4
-
2
0:06
0:58
0.36
 
Search WWH ::




Custom Search