Good solutions - Foundations of Data Exchange

Database Reference

In-Depth Information

time. Indeed, it suffices to check whether there is a homomorphism h : T

T such that

the size of the image of T under h is strictly less than the size of T . Then we replace T by

h ( T ), and iteratively continue this process until reaching a fixed-point.

The blocks method was also extended to the case when

→

Σ t consists of a set of egds. There

is an extra difficulty in this case. The property mentioned above, the bounded block-size

in the Gaifman graph of the canonical universal solution T of a source instance S ,isno

longer true in the presence of egds. This is because the chase, when applied to egds, can

equate nulls from different blocks, and thus, create blocks of nulls of larger and larger size.

This problem is solved by a surprising rigidity lemma stating the following. Let T be the

canonical universal solution for a source instance with respect to a set of st-tgds

Σ st ,and

let T be the target instance obtained from T by chasing with the egds in

Σ t .Theniftwo

in T , then the

nulls

⊥ 1 and

⊥ 2 in different blocks of T are replaced by the same null

⊥

is rigid .Thatis,if h : T →

T is a homomorphism then h (

, and thus, T has

null

⊥

the bounded block-size property if we treat those nulls as constants.

The situation is much more complex in the presence of tgds. This is because the canon-

ical universal solution T for a source instance S does not have the bounded block-size

property, and in addition, it is no longer true that equated nulls are rigid. A refined version

of the blocks method has been developed; it was used to show that computing cores of

universal solutions for mappings whose set of target dependencies consists of egds and a

weakly acyclic set of tgds can be done in polynomial time.

Theorem 6.18

Σ t con-

sists of a set of egds and a weakly acyclic set of tgds. There is a polynomial-time algorithm

that, for every source instance S, checks whether a solution for S exists, and if that is the

case, computes the core of the universal solutions for S.

Let

=(R s , R t ,

Σ st ,

Σ t ) be a fixed relational mapping, such that

The proof of this result, which is based on the notion of hypertree decompositions, is

technically quite involved and goes beyond the scope of this topic.

Summary

In summary, the core is a universal solution that has several good properties for data ex-

change purposes: it is unique up to isomorphism, it is the smallest universal solution, and

it can be computed in polynomial time for mappings with a weakly acyclic set of tgds.

However, the choice of a solution to materialize does not boil down exclusively to the

size of such a solution. After all, we materialize solutions to answer queries; in addition,

we should also consider the cost of building a solution. We have already seen that it is

costlier to build the core than the canonical universal solution (as building the latter is the

first step in building the core). In addition, we shall see in the next chapter that although the

core and the canonical universal solution are indistinguishable with respect to some basic

query answering tasks, the canonical universal solution behaves better than the core when

handling more expressive query languages.

We conclude that there is a tradeoff in deciding which solution - canonical universal or

core - to materialize:

Foundations of Data Exchange

Search WWH ::

Custom Search

Home