Database Reference
In-Depth Information
pattern is clearly at least as hard as deciding if it exists, which constitutes the satisfiabil-
ity problem. If the schema is part of the input, the satisfiability problem is NP-hard by
Theorem 11.7 . It is not difficult to find fixed schemas for which the problem is NP-hard
(see Exercise 16.3 ). This means that if we want a polynomial solution-building procedure,
reduction to constructive satisfiability is not enough.
12.3 Nested-relational DTDs
As we learned in the previous section, the approach via a single pattern combining all the
target requirements leads to an inefficient algorithm. In this section we introduce a different
approach: instead of combining the requirements and then trying to find a solution satis-
fying the obtained pattern, we first find partial solutions for all target requirements, and
then try to combine the solutions. We first implement this general recipe in a simple setting
of SM nr (
), i.e., mappings with nested-relational DTDs using only vertical navigation,
and identify difficulties arising in the general case.
,
Definition 12.6 (Partial solution) Let
M
be mapping and T a source tree. A partial
solution for a target requirement
ψ
( a , z )
Δ T ,M is a tree that conforms to the target schema
( a , b ) for some tuple b .
of
M
and satisfies
ψ
The first step of the recipe is easy. Since the target side patterns are bounded by the
size of the mapping, the partial solutions for all target requirements can be constructed in
polynomial time by Lemma 12.3 if the mapping is fixed. The difficult part is to combine
the partial solutions into one tree that satisfies all target side patterns.
Let us begin with an example. Consider the schema r
ab
with a , b :@ att r and two trees
S = r [ a (1) , b (2)] ,
T = r [ a (1) , b (3)] .
A natural combination of S and T is
S
T = r [ a (1) , b (2) , b (3)] .
Clearly S
T contains all nodes of S and T , and more generally, S
T satisfies all patterns
from SM nr (
) that are satisfied by S or T .
Note that not all nodes of S and T are treated in the same way. The nodes b (2) and
b (3) are put separately in S
,
T ,butthe a (1)-nodes and r -nodes from S and T are merged
together. This seems the only reasonable thing to do, but it is only possible under certain
assumptions: the nodes can be put separately only if the schema allows it, and they can be
merged only if they store the same data value. For instance, if we take r [ a (3) , b (3)] instead
of T , the trees cannot be combined. Likewise for D replaced with r
ab .
This gives rise to the following strategy: if the schema allows it, put both nodes, oth-
erwise merge the nodes. But how do we distinguish the nodes that must be merged? For
nested-relational DTDs they can be identified easily.
Search WWH ::




Custom Search