Databases Reference
In-Depth Information
Step 1: We generate the possible reformulations of Q (a reformulation query com-
putes all certain answers when executed on the source data) by considering every
combination of the form .m 1 ;:::;m l /,wherem i is one of the possible mappings
in pM i . Denote the set of reformulations by Q 0 1 ;:::;Q 0 k
. The probability of a
reformulation Q 0 D .m 1 ;:::;m l / is ˘ i D 1 Pr.m i /.
Step 2: For each reformulation Q 0 , retrieve each of the unique answers from the
sources. For each answer obtained by Q 0 1 [ ::: [ Q 0 k
, its probability is computed
by summing the probabilities of the Q 0 's in which it is returned.
Importantly, note that it is possible to express both steps as an SQL query with
grouping and aggregation. Therefore, if the underlying sources support SQL, we
can leverage their optimizations to compute the answers.
With our restricted form of schema mapping, the algorithm takes time poly-
nomial in the size of the data and the mappings. We, thus, have the following
complexity result.
Theorem 1. Let pM be a schem a p- mapping and let Q be an SPJ query.
Answering Q with respect to pM in by-table semantics is in PTIME in the size
of the data and the mapping.
t
3.3.2
By-tuple Query Answering
To extend the by-table query-answering strategy to by-tuple semantics, we would
need to compute the certain answers for every mapping sequence generated by pM.
However, the number of such mapping sequences is exponential in the size of the
input data. The following example shows that for certain queries, this exponential
time complexity is inevitable.
Example 4. Suppose that in addition to the tables in Example 1 ,wealsohave
U(city) in the source and V(hightech) in the target. The p-mapping for V contains
two possible mappings: (
,0.2).
Consider the following query Q, which decides if there are any people living in
a high-tech city.
f
(city, hightech)
g
,0.8)and(
;
Q: SELECT 'true'
FROM T, V
WHERE T.mailing-addr = V.hightech
An incorrect way of answering the query is to first execute the following two
subqueries Q 1
and Q 2
and then join the answers of Q 1
and Q 2 , and summing up
the probabilities.
Q1: SELECT mailing-addr FROM T
Q2: SELECT hightech FROM V
Now, consider the source instance D,whereD S isshowninFig. 4.2 a, and D U
has two tuples (“Mountain View”) and (“Sunnyvale”). Figure 4.4 a,b show Q tuple
1
.D/
 
Search WWH ::




Custom Search