Database Reference
In-Depth Information
2.5 Basics of automata theory
In the last part of the topic, where we deal with XML data exchange, we shall need some
basic facts about automata on words and trees. Most of the notions of schemas for XML
are automata-based; since XML documents themselves are modeled as labeled unranked
trees, automata that one uses are those that run on such trees. In this section we review the
basics of automata on words, and on binary trees. Automata on XML trees will be defined
in the last part of the topic.
If
Σ stands for the set of all finite words over
.Welet
ε denote the empty word. A nondeterministic finite automaton (NFA) is a tuple A =
( Q ,
Σ
is a finite alphabet, then
Σ
Σ
, q 0 , F ,
δ
),where
Q is a finite set of states, with the initial state q 0
Q and the set of final states F
Q ,
and
δ
2 Q
: Q
× Σ
is the transition function.
Given a word w = a 0 a 1 ... a n 1 of length n over
Σ
,a run of
A
on w is a function
ρ
:
{
0 ,..., n
1
}→
Q such that:
δ
( q 0 , a 0 );and
1.
ρ
(0)
2.
ρ
( i +1)
δ
(
ρ
( i ) , a i +1 ) for 0 < i < n
1.
A run shows in which state the automaton can be after reading each letter a i of w .Itstarts
in q 0 and, whenever it is in state q and reads letter a , it moves to a state in
δ
( q , a ).Note
that there could be more than one run of an NFA on a string.
Arunis accepting if
ρ
( n
1)
F , i.e., if it ends in a final state. A word w is accepted
by
A
if there exists an accepting run of
A
on w . The set of all words accepted by
A
is the
language accepted by the automaton, denoted by
L
(
A
). Languages arising in this way
are called regular .
There is another familiar way of specifying regular languages, namely by means of
regular expressions. These are defined by the grammar:
e , e := 0
e |
e |
e .
| ε |
a , a
Σ |
·
e
e
These will be used in some XML schema specifications (DTDs). We assume that the reader
is familiar with regular expressions. A regular expression can be converted into an equiva-
lent automaton in polynomial time.
There are some standard decision problems related to NFAs. One of them is nonempti-
ness: given
accept at least one word? This problem is
known to be NL OGSPACE -complete (and thus solvable in polynomial time). At the other
end, one can ask whether the automaton is universal, i.e.,
A
,is
L
(
A
)
= 0? That is, does
A
Σ . This problem is
P SPACE -complete (and thus it is very unlikely to be solvable in polynomial time). The
problem of checking, for two automata
L
(
A
)=
A 1 and
A 2 ,whether
L
(
A 1 )
⊆ L
(
A 2 ) is also
P SPACE -complete.
Automata can run not only on words but also on trees. We now briefly review automata
Search WWH ::




Custom Search