Building solutions - Foundations of Data Exchange

Database Reference

In-Depth Information

pattern is clearly at least as hard as deciding if it exists, which constitutes the satisfiabil-

ity problem. If the schema is part of the input, the satisfiability problem is NP-hard by

Theorem 11.7 . It is not difficult to find fixed schemas for which the problem is NP-hard

(see Exercise 16.3 ). This means that if we want a polynomial solution-building procedure,

reduction to constructive satisfiability is not enough.

12.3 Nested-relational DTDs

As we learned in the previous section, the approach via a single pattern combining all the

target requirements leads to an inefficient algorithm. In this section we introduce a different

approach: instead of combining the requirements and then trying to find a solution satis-

fying the obtained pattern, we first find partial solutions for all target requirements, and

then try to combine the solutions. We first implement this general recipe in a simple setting

of SM nr (

), i.e., mappings with nested-relational DTDs using only vertical navigation,

and identify difficulties arising in the general case.

⇓

,

∼

Definition 12.6 (Partial solution) Let

M

be mapping and T a source tree. A partial

solution for a target requirement

ψ

( a , z )

∈ Δ T ,M is a tree that conforms to the target schema

( a , b ) for some tuple b .

of

M

and satisfies

ψ

The first step of the recipe is easy. Since the target side patterns are bounded by the

size of the mapping, the partial solutions for all target requirements can be constructed in

polynomial time by Lemma 12.3 if the mapping is fixed. The difficult part is to combine

the partial solutions into one tree that satisfies all target side patterns.

Let us begin with an example. Consider the schema r

ab ∗

→

with a , b :@ att r and two trees

S = r [ a (1) , b (2)] ,

T = r [ a (1) , b (3)] .

A natural combination of S and T is

S

⊕

T = r [ a (1) , b (2) , b (3)] .

Clearly S

⊕

T contains all nodes of S and T , and more generally, S

⊕

T satisfies all patterns

from SM nr (

) that are satisfied by S or T .

Note that not all nodes of S and T are treated in the same way. The nodes b (2) and

b (3) are put separately in S

⇓

,

∼

T ,butthe a (1)-nodes and r -nodes from S and T are merged

together. This seems the only reasonable thing to do, but it is only possible under certain

assumptions: the nodes can be put separately only if the schema allows it, and they can be

merged only if they store the same data value. For instance, if we take r [ a (3) , b (3)] instead

of T , the trees cannot be combined. Likewise for D replaced with r

⊕

ab .

This gives rise to the following strategy: if the schema allows it, put both nodes, oth-

erwise merge the nodes. But how do we distinguish the nodes that must be merged? For

nested-relational DTDs they can be identified easily.

→

Foundations of Data Exchange

Search WWH ::

Custom Search

Home