Database Reference
InDepth Information
3. the set of attributes for each node labeled
is
A
D
(
).
In particular, if an
labeled node is a leaf in a tree
T
that conforms to
D
, then the empty
string
must belong to the language
P
D
(
).
We shall often deal with a very common subclass of DTDs based on simple regular
expressions. Examples of such simple expressions are
country
∗
and
ruler
∗
,orforex
ample,
book
·
chapter
∗
,or
book
·
author
∗
·
publisher
?, where
? is an abbreviation for
ε

ε
(i.e., an optional element). Formally, these are defined as follows.
Definition 10.5 ADTD
D
is
nestedrelational
if two conditions are satisfied:
•
Every production in it is of the form
ˆ
1
·
ˆ
m
,
→
...
·
and each
ˆ
i
is either
i
,or
i
,or
i
,or
where all the
i
's are distinct labels from
Γ
i
? =
i

ε
.
•
The DTD
D
is not recursive, i.e., there are no cycles in the graph in which we put a
directed edge from
to all the
i
's for each production above.
The DTD used in our example in this section is nestedrelational. In fact, empirical
studies show that these are very common in practice, and cover many, if not most, real
world DTDs. As we will see shortly, many computational problems become easier for
them.
Automata and schemas
There are specification languages which are richer than DTDs. In those languages (e.g.,
XML Schema, or RelaxNG), one can specify more general
regular
properties of trees. For
example, one can define the string of labels of children of a node based on the whole path
to the node from the root, not just its label. In general, all these languages are captured by
the power of automata for trees.
We have explained how automata operate on binary trees (see
Section 2.5
). In such
trees, each non leaf node has exactly two children, so transition functions are of the form
Q
2
Q
and define possible states of a node, based on its label and the states of
its two children. In the case of unranked trees, we do not know a priori how many children
a node would have. Instead of introducing a separate transition function δ
n
:
Q
n
×
Q
×
Γ
→
×
Γ
→
2
Q
for nodes with
n
children, these automata are defined with a different transition function,
that assigns regular languages of
states
to states and labels.
Definition 10.6 An
unranked nondeterministic finite tree automaton
(UNFTA) on un
ranked nodelabeled trees over alphabet
Γ
is defined as
A
=(
Q
,
Γ
,
F
,
δ
) where
•
Q
is the set of states,
•
F
⊆
Q
is the set of final, or accepting states, and
2
Q
∗
is the transition function such that
•
δ
:
Q
×
Γ
→
δ
(
q
,
a
) is a regular language over the
set of states
Q
,forevery
q
∈
Q
and
a
∈
Γ
.