Inverting schema mappings - Foundations of Data Exchange

Database Reference

In-Depth Information

that Q ( S )= certain M ( Q , S ). That is, it is possible to obtain the set of certain answers of

Q over S under

by just evaluating its rewriting Q over the source S .

The computation of a source rewriting of a conjunctive query is a basic step in the algo-

rithm presented in this section. This problem has been extensively studied in the database

area and, in particular, in the data integration context. In fact, it is well known that:

Proposition 20.30

There exists an algorithm Q UERY R EWRITING that, given a mapping

a finite set of st-tgds, and a conjunctive query Q over R 2 ,com-

putes a union of conjunctive queries with equality Q that is a rewriting of Q over the

source. The algorithm runs in exponential time and its output is of exponential size in the

size of

=(R 1 , R 2 ,

) , with

and Q.

Example 20.31 We give here some intuition of why the algorithm Q UERY R EWRITING

uses union and equalities in its output language. Let R 1 be a schema consisting of a unary

relation P and a binary relation R , R 2 be a schema consisting of a binary relation T and

=(R 1 , R 2 ,

),where

is a set of dependencies consisting of the following st-tgds:

P ( x )

→

T ( x , x ) ,

R ( x , y )

→

T ( x , y ) .

Assume that Q is the target query T ( x , y ). What is a source rewriting of Q ?Toanswer

this question, we need to consider all the possibles ways of generating target tuples from a

source instance. Let S be an instance of R 1 .If S contains a fact P ( a ), then all the solutions

for S under

x = y

over S will be in certain M ( Q , S ).Inthesameway,if S contains a fact R ( a , b ),thenall

the solutions for S under

will contain the fact T ( a , a ). Thus, every answer to the query P ( x )

∧

will contain the fact T ( a , b ) and, hence, every answer to the

query R ( x , y ) over S will be in certain M ( Q , S ). Given the definition of

, the previous

two queries consider all the possible ways to generate target tuples according to

, from

which one can formally prove that the following is a source rewriting of query Q (see

Exercise 22.3 ):

( P ( x )

∧

x = y )

∨

R ( x , y ) .

It is important to notice that the above query is a union of two conjunctive queries, and that

the use of union and equality in this rewriting is unavoidable.

We finally have all the necessary ingredients to present the algorithm for computing maxi-

mum recoveries. In this procedure, C refers to the unary predicate introduced in Section 7.4

that distinguishes between constant and null values (C( a ) holds if and only if a belongs to

C ONST ). Moreover, if x =( x 1 ,..., x k ),thenC( x ) is used in the algorithm as a shorthand

for C( x 1 )

∧···∧

C( x k ).

Theorem 20.32 Algorithm M AXIMUM R ECOVERY runs in exponential time, and on input

=(R 1 , R 2 ,

) ,where Σ is a finite set of st-tgds, it computes a maximum recovery of M .

Foundations of Data Exchange

Search WWH ::

Custom Search

Home