XML-to-XML queries - Foundations of Data Exchange

Database Reference

In-Depth Information

Proof As we did before, we start with the pattern

δ T ,M ( z ) obtained from the set

( a , z ) ϕ

( a , b )

Δ T ,M =

{ ψ

( x , y )

→∃

( x , z )

∈ Σ st , T

}

by first renaming variables so that each is used in one pattern only, and then merging all

the patterns at the root. Recall that a tree conforming to the target DTD is a solution if and

only if it satisfies

δ T ,M ( a ) for some tuple a .

The next step is completing

δ T ,M , i.e., adding all the nodes required by the target DTD.

Before, the pattern only used the child relation, and completion could be done in a unique

way. Now // or are allowed and there can be more than one way to complete a pattern. A

basis should reflect all those possibilities and that is why the procedure will now compute

a set of patterns corresponding to all possible choices.

The idea is to start from replacing wildcards with concrete labels and descendants with

concrete paths in a way consistent with the target DTD. The algorithm processes the nodes

of the patterns top-down and creates new patterns when facing a choice.

Let

{ δ T ,M }

. While

contains a pattern with an unprocessed node, proceed as

follows. Choose a pattern

and an unprocessed node v such that the path from the root

to v contains only processed nodes. Let

ϕ ∈ Φ

be the subpattern rooted in v and let

be the label

of its parent. If the production for

in the target DTD is empty, simply remove

from

Otherwise, remove

from

and instead add the output of the following procedure:

= ( t )[

•

ψ 1 ,...,

ψ k ] and

∈ Γ

,return

with the root of

marked as processed;

= ( t )[

•

ψ 1 ,...,

ψ k ], return the set of patterns that can be obtained from

by replacing

( t )[

with

ψ 1 ,...,

ψ k ] whose root is marked as processed and

is some label occurring

t |

attributes, unless t is empty);

in the production for

(

must have

ψ , return two patterns that can be obtained by replacing in

•

= //

the pattern

ψ or by //

ψ with all nodes unmarked.

Every node is processed at most twice, and each processing introduces at most

new

patterns. The procedure terminates after at most exponentially many steps. In particular,

the number of produced patterns is single-exponential. The size of each pattern is polyno-

mial in the size of T . It is fairly easy to see that the disjunction of the obtained formulae is

equivalent to

| Γ |

δ T , M (over trees conforming to the target DTD). If we now apply the original

completion procedure cpl D t

to each pattern in

, we obtain an equivalent set of complete

patterns.

Finally, each of the obtained patterns needs to be merged, as described in Section 13.4 .

Remove

from the set of obtained patterns. Like before, each of the remaining patterns

is a solution (viewed as a tree). Together they form a basis of solutions: each solution T

satisfies the disjunctions of those patterns, so there must be a homomorphism from one of

them to T . If the obtained set of patterns is empty, the tree T has no solutions at all.

⊥

Theorem 14.16 and the recipe sketched out in the beginning of this section give us an

algorithm - albeit an expensive one - for computing certain answers.

Foundations of Data Exchange

Search WWH ::

Custom Search

Home