Database Reference
In-Depth Information
that Q ( S )= certain M ( Q , S ). That is, it is possible to obtain the set of certain answers of
Q over S under
by just evaluating its rewriting Q over the source S .
The computation of a source rewriting of a conjunctive query is a basic step in the algo-
rithm presented in this section. This problem has been extensively studied in the database
area and, in particular, in the data integration context. In fact, it is well known that:
M
Proposition 20.30
There exists an algorithm Q UERY R EWRITING that, given a mapping
M
a finite set of st-tgds, and a conjunctive query Q over R 2 ,com-
putes a union of conjunctive queries with equality Q that is a rewriting of Q over the
source. The algorithm runs in exponential time and its output is of exponential size in the
size of
=(R 1 , R 2 ,
Σ
) , with
Σ
Σ
and Q.
Example 20.31 We give here some intuition of why the algorithm Q UERY R EWRITING
uses union and equalities in its output language. Let R 1 be a schema consisting of a unary
relation P and a binary relation R , R 2 be a schema consisting of a binary relation T and
M
=(R 1 , R 2 ,
Σ
),where
Σ
is a set of dependencies consisting of the following st-tgds:
P ( x )
T ( x , x ) ,
R ( x , y )
T ( x , y ) .
Assume that Q is the target query T ( x , y ). What is a source rewriting of Q ?Toanswer
this question, we need to consider all the possibles ways of generating target tuples from a
source instance. Let S be an instance of R 1 .If S contains a fact P ( a ), then all the solutions
for S under
x = y
over S will be in certain M ( Q , S ).Inthesameway,if S contains a fact R ( a , b ),thenall
the solutions for S under
M
will contain the fact T ( a , a ). Thus, every answer to the query P ( x )
will contain the fact T ( a , b ) and, hence, every answer to the
query R ( x , y ) over S will be in certain M ( Q , S ). Given the definition of
M
, the previous
two queries consider all the possible ways to generate target tuples according to
M
, from
which one can formally prove that the following is a source rewriting of query Q (see
Exercise 22.3 ):
M
( P ( x )
x = y )
R ( x , y ) .
It is important to notice that the above query is a union of two conjunctive queries, and that
the use of union and equality in this rewriting is unavoidable.
We finally have all the necessary ingredients to present the algorithm for computing maxi-
mum recoveries. In this procedure, C refers to the unary predicate introduced in Section 7.4
that distinguishes between constant and null values (C( a ) holds if and only if a belongs to
C ONST ). Moreover, if x =( x 1 ,..., x k ),thenC( x ) is used in the algorithm as a shorthand
for C( x 1 )
∧···∧
C( x k ).
Theorem 20.32 Algorithm M AXIMUM R ECOVERY runs in exponential time, and on input
M
=(R 1 , R 2 ,
Σ
) ,where Σ is a finite set of st-tgds, it computes a maximum recovery of M .
Search WWH ::




Custom Search