Good solutions - Foundations of Data Exchange

Database Reference

In-Depth Information

We show next that it outputs the core if the chase computing the canonical universal

solution does not fail. Assume that T is the result of a successful chase sequence for S

under

. We first show that each instance T ∗ that is obtained from T inside the for loop

in lines (6-10) is a universal solution for S . By definition, ( S , T ∗ )

Σ st .Since T ∗ is a

Σ t . Thus, T ∗ is a

solution for S . Finally, T ∗ is a subinstance of T , and, therefore, there is a homomorphism

from T ∗ into any other solution. This proves that T ∗ is also a universal solution for S .

Assume now that T ∗ is the result of C OMPUTE C ORE (

Σ t ,wehavethat T ∗ |

subinstance of T and T satisfies each egd in

)for S . We prove next that T ∗ is

the core of the universal solutions for S .Aswehavejustproved, T ∗ is a universal solution

and hence it contains a copy T 0 of the core of the universal solutions as a subinstance. It

follows from Theorem 6.15 that T 0 ∈

S OL M ( S ). Assume now, for the sake of contradiction,

= T ∗ and hence that there is a fact R ( t )

T ∗ \

T 0 . Thus, T ∗ \{

R ( t )

that T 0

∈

}

is a solution

for S (because it contains T 0 ), and hence it satisfies

Σ st . This implies that the result of the

algorithm could not have been T ∗ . This is our desired contradiction.

Unfortunately, the algorithm described above cannot be easily adapted to more complex

mappings that have both tgds and egds among target constraints. Indeed, a crucial assump-

tion in the previous algorithm is that whenever an instance satisfies an egd, then every

subinstance of it also satisfies it. However, this assumption fails for tgds. For example,

the instance T =

V ( x ), but removing the fact V ( a )

yields an instance that does not satisfy it. In fact, if the previous greedy algorithm is applied

directly to mappings in which

{

U ( a ) , V ( a )

}

satisfies the tgd U ( x )

→

Σ t contains at least one tgd, then the resulting instance may

fail to be a solution.

Thus, more sophisticated techniques need to be developed if one wants to compute cores

of universal solutions in polynomial time in the presence of tgds. Unfortunately, no simple

adaptation of the previous greedy algorithm is known to solve this problem. Hence different

techniques have to be developed, which are based on the blocks method that is described

below.

Let us assume for the time being that we deal with mappings without target dependen-

cies. The blocks method relies on the following observation. If T is the canonical universal

solution of a source instance S with respect to a set of st-tgds

Σ st , then the Gaifman graph

of the nulls of T consists of a set of connected components (blocks) of size bounded by a

constant c that depends only on the mapping (which we assume to be fixed). By the Gaif-

man graph of nulls of T we mean the graph whose nodes are the nulls of T , such that two

nulls

⊥ 1 ,

⊥ 2 are adjacent if and only if there is a tuple in some relation of T that mentions

⊥ 2 .

A crucial observation is that checking whether there is a homomorphism from T into an

arbitrary instance T can be done in polynomial time. The justification is that this problem

boils down to the problem of checking whether each block of T has a homomorphism into

T . The latter can be solved in polynomial time since the size of each block is bounded by

c - hence there are polynomially many candidates for such a homomorphism. It follows

that computing the core of the canonical universal solution T can be done in polynomial

⊥ 1 and

both

Foundations of Data Exchange

Search WWH ::

Custom Search

Home