Database Reference
In-Depth Information
Definition 12.7 Fix a DTD D over
Γ
. For a node v in a tree conforming to T we can
consider the label sequence
σ n that is read on the downward path from the root to
v . We say that the label sequence is unambiguous if for all i , the symbol
σ 1 σ 2 ...
σ i +1 occurs in the
production for
σ i as
σ i +1 ?or
σ i +1 .
+
i +1 in the production for
σ i +1 or
If
σ i +1 occurred as
σ
σ i ,thena
σ i -labeled node could
have had many different children labeled
σ i +1 , and there would be some ambiguity regard-
ing the appearance of a
σ i +1 -labeled node as its child. If, however,
σ i +1 occurs as
σ i +1 ?or
σ i +1 in the production for
σ i , then the position of a
σ i +1 -labeled child of a
σ i -labeled node
is unambiguous.
Note that in every tree conforming to D
nodes with unambiguous label sequences form a subtree which contains a root;
there is at most one node with any given unambiguous label sequence (there may be
none if
σ i +1 occurs in the production for
σ
i as
σ i +1 ?forsome i ).
The second property implies that nodes with the same unambiguous label sequence taken
from S and T always need to be merged. It turns out that all other nodes can be put into
S
T without merging.
Thus, the partial solutions we are constructing need to store the same data values in
nodes with the same unambiguous label sequences, which means that they are all exten-
sions of a single XML tree containing all unambiguous label sequences. It is possible that
some of these sequences correspond to nodes optional in the schema and are not required
by the patterns, but adding them does not spoil the partial solution.
The idea of the algorithm is then to guess this tree, construct partial solutions extending
it, and then combine them. The manipulations on parts of trees performed by the algorithm
are most easily described in terms of forests and contexts.
Definition 12.8 A forest is a sequence of trees. We write F + G for the concatenation of
forests F and G .
A multicontext C over an alphabet
Γ
is a tree over
Γ ∪{◦}
such that
-labeled nodes have
at most one child. The nodes labeled with
are called ports .A context is a multicontext
with a single port, which is additionally required to be a leaf. A leaf port u can be substi-
tuted with a forest F , which means that in the sequence of the children of u 's parent, u is
replaced by the roots of F . An internal port u can be substituted with a context C with one
port u : first the subtree rooted at u 's only child is substituted at u , then the obtained tree
is substituted at u .
For a context C and a forest F we write C
F to denote the tree obtained by substituting
the unique port of C with F .Ifweuseacontext D instead of the forest F , the result of the
substitution is a context as well. More generally, for a multicontext M with leaf ports only,
say u 1 , u 2 ,..., u m , we write M ( T 1 , T 2 ,..., T m ) for the tree obtained by substituting T i at u i ,
for 1
·
i
m .
Instead of the subtree containing all unambiguous label sequences, it is convenient to
Search WWH ::




Custom Search