Database Reference
In-Depth Information
Proof As we did before, we start with the pattern
δ T ,M ( z ) obtained from the set
( a , z ) ϕ
( a , b )
Δ T ,M =
{ ψ
( x , y )
→∃
z
ψ
( x , z )
Σ st , T
|
=
ϕ
}
by first renaming variables so that each is used in one pattern only, and then merging all
the patterns at the root. Recall that a tree conforming to the target DTD is a solution if and
only if it satisfies
δ T ,M ( a ) for some tuple a .
The next step is completing
δ T ,M , i.e., adding all the nodes required by the target DTD.
Before, the pattern only used the child relation, and completion could be done in a unique
way. Now // or are allowed and there can be more than one way to complete a pattern. A
basis should reflect all those possibilities and that is why the procedure will now compute
a set of patterns corresponding to all possible choices.
The idea is to start from replacing wildcards with concrete labels and descendants with
concrete paths in a way consistent with the target DTD. The algorithm processes the nodes
of the patterns top-down and creates new patterns when facing a choice.
Let
Φ
=
{ δ T ,M }
. While
Φ
contains a pattern with an unprocessed node, proceed as
follows. Choose a pattern
and an unprocessed node v such that the path from the root
to v contains only processed nodes. Let
ϕ Φ
ψ
be the subpattern rooted in v and let
σ
be the label
of its parent. If the production for
σ
in the target DTD is empty, simply remove
ϕ
from
Φ
.
Otherwise, remove
ϕ
from
Φ
and instead add the output of the following procedure:
= ( t )[
if
ψ
ψ 1 ,...,
ψ k ] and
Γ
,return
ϕ
with the root of
ψ
marked as processed;
= ( t )[
if
ψ
ψ 1 ,...,
ψ k ], return the set of patterns that can be obtained from
ϕ
by replacing
( t )[
ψ
with
τ
ψ 1 ,...,
ψ k ] whose root is marked as processed and
τ
is some label occurring
t |
attributes, unless t is empty);
in the production for
σ
(
τ
must have
|
ψ , return two patterns that can be obtained by replacing in
if
ψ
= //
ϕ
the pattern
ψ
by
ψ or by //
ψ with all nodes unmarked.
Every node is processed at most twice, and each processing introduces at most
new
patterns. The procedure terminates after at most exponentially many steps. In particular,
the number of produced patterns is single-exponential. The size of each pattern is polyno-
mial in the size of T . It is fairly easy to see that the disjunction of the obtained formulae is
equivalent to
| Γ |
δ T , M (over trees conforming to the target DTD). If we now apply the original
completion procedure cpl D t
to each pattern in
Φ
, we obtain an equivalent set of complete
patterns.
Finally, each of the obtained patterns needs to be merged, as described in Section 13.4 .
Remove
from the set of obtained patterns. Like before, each of the remaining patterns
is a solution (viewed as a tree). Together they form a basis of solutions: each solution T
satisfies the disjunctions of those patterns, so there must be a homomorphism from one of
them to T . If the obtained set of patterns is empty, the tree T has no solutions at all.
Theorem 14.16 and the recipe sketched out in the beginning of this section give us an
algorithm - albeit an expensive one - for computing certain answers.
Search WWH ::




Custom Search