3. the set of attributes for each node labeled is A D ( ).
In particular, if an -labeled node is a leaf in a tree T that conforms to D , then the empty
must belong to the language P D ( ).
We shall often deal with a very common subclass of DTDs based on simple regular
expressions. Examples of such simple expressions are country ∗ and ruler ∗ ,orforex-
ample, book · chapter ∗ ,or book · author ∗ · publisher ?, where ? is an abbreviation for
(i.e., an optional element). Formally, these are defined as follows.
Definition 10.5 ADTD D is nested-relational if two conditions are satisfied:
Every production in it is of the form
ˆ 1 ·
ˆ m ,
and each ˆ i is either i ,or i ,or i ,or
where all the i 's are distinct labels from
i ? = i | ε
The DTD D is not recursive, i.e., there are no cycles in the graph in which we put a
directed edge from to all the i 's for each production above.
The DTD used in our example in this section is nested-relational. In fact, empirical
studies show that these are very common in practice, and cover many, if not most, real-
world DTDs. As we will see shortly, many computational problems become easier for
Automata and schemas
There are specification languages which are richer than DTDs. In those languages (e.g.,
XML Schema, or RelaxNG), one can specify more general regular properties of trees. For
example, one can define the string of labels of children of a node based on the whole path
to the node from the root, not just its label. In general, all these languages are captured by
the power of automata for trees.
We have explained how automata operate on binary trees (see Section 2.5 ). In such
trees, each non leaf node has exactly two children, so transition functions are of the form
2 Q and define possible states of a node, based on its label and the states of
its two children. In the case of unranked trees, we do not know a priori how many children
a node would have. Instead of introducing a separate transition function δ n : Q n
× Γ →
× Γ → 2 Q
for nodes with n children, these automata are defined with a different transition function,
that assigns regular languages of states to states and labels.
Definition 10.6 An unranked nondeterministic finite tree automaton (UNFTA) on un-
ranked node-labeled trees over alphabet
is defined as
=( Q ,
, F ,
Q is the set of states,
Q is the set of final, or accepting states, and
2 Q ∗ is the transition function such that
× Γ →
( q , a ) is a regular language over the
set of states Q ,forevery q
Q and a