Scanners (Compiler Writing) Part 3

Nondeterministic Finite-State Acceptors

Nondeterministic finite-state acceptors (NFA), or nondeterministic finite automata, are similar to deterministic finite-state acceptors except that there may be more than one possible state that is attainable from a given state for the same input character.

Table 4-1

State

Input

0

1

S

B

A

A

C

S

B

S

C

C

A

B

Table 4-2 Traces of two strings using the DFA

Input string

110101

11101

M(S, 110101) = A/(/U0101)

M(S,11101) = M(A,1101)

= M(S, 0101)

= Af(5,101)

= A/(fl,101)

= M(A,01)


= M(C, 01)

= M(C, 1)

= M(A, 1)

= M(B,e)

= M(S, e)

= B (reject)

= S (accept)

Thus an ambiguity seems to arise because the acceptor has several possibilities to choose from. However, the acceptor is considered to explore all possibilities, one at a time, to determine if a string should be accepted. Consider the NFA given in Fig. 4-6 which accepts strings of the form amb" where m, n > 1. A string will be accepted by the NFA if at least one valid sequence of state transitions exists such that the acceptor is in a final state when the entire string has been read. From the finite-state diagram, it should be observed that in order for the acceptor to accept a string if possible, it stays in state A until the final character a is read. The acceptor then changes to state B. Similarly, the NFA stays in state B until the final character b is read, causing a transition to state C.

Definition 4-3 A nondeterministic finite-state acceptor is a 5-tuple (K, VT, M, S, Z) where:

A" is a finite, nonempty set of states.

VT is an input alphabet.

M is a mapping of K X VT into subsets oftmp4C8-162_thumb[2][2] tmp4C8-163_thumb[2][2]is the starting state, tmp4C8-164_thumb[2][2]is the set of final states.

Notice that the mapping function M is similar to that for a DFA except that M maps into a possibly empty set of states. Hence we write M(Q,T) = {Px, P2,…, Pn} where a transition from state Q to one of the states Px, P2,…,P„ occurs when a character T from VT is read. The mapping function for an NFA is extended with the following definitions:

An example of a nondeterministic finite-state acceptor.

Figure 4-6 An example of a nondeterministic finite-state acceptor.

 

tmp4C8-169_thumb[2][2]

The first definition implies that for the empty string, the acceptor must stay in the same state. The meaning of the second definition is less obvious. If M(Q, T) = {Px, P2,.. -, P„], then the second definition can be written as M(Q,Tt) =

tmp4C8-170_thumb[2][2]Thus, M(Q,Tt) is the set of all possible states that the acceptor may end up in after the string Tt has been read. The third definition extends the mapping function totmp4C8-171_thumb[2][2]and defines the mapping to be the union of all the sets resulting from applying each individual state from the set of states to the string x.

A string x is said to be accepted by an NFA if at least one of the states attainable from the starting state by reading x is a final state; that is, x is accepted by the NFA F = (K, VT, M, S, Z) if for some statetmp4C8-172_thumb[2][2]then

tmp4C8-173_thumb[2][2]

As an example of an NFA, consider a machine which accepts the set of strings in {a,b,c}* such that the last symbol in the input string also appears earlier in the string. For example, bab is accepted but cbbca is not. The finite-state diagram for this acceptor is given in Fig. 4-7. For M as defined in Table 4-3, the acceptor is given as

tmp4C8-178_thumb[2][2]

With the input string aca, the value of M(q0, aca) can be determined as follows:

tmp4C8-179_thumb[2][2]

Table 4-3 Transition table for a nondeterministic finite-state acceptor

State

Input symbol

a

. h

c

tmp4C8-180 tmp4C8-181 tmp4C8-182 tmp4C8-183
tmp4C8-184 tmp4C8-185 tmp4C8-186 tmp4C8-187
tmp4C8-188 tmp4C8-189 tmp4C8-190 tmp4C8-191
tmp4C8-192 tmp4C8-193 tmp4C8-194 tmp4C8-195
tmp4C8-196 tmp4C8-197 tmp4C8-198 tmp4C8-199

 

Transition diagram for a nondeterministic finite-state acceptor.

Figure 4-7 Transition diagram for a nondeterministic finite-state acceptor.

Now

tmp4C8-201_thumb[2][2]

so

tmp4C8-202_thumb[2][2]

Hence

tmp4C8-203_thumb[2][2]

and sincetmp4C8-204_thumb[2][2]the string aca is accepted.

It should be emphasized that although a nondeterministic machine does not have unique moves, it does not contain a random device for choosing its move. Rather, the acceptor explores all possible move sequences, and if at least one of these sequences leads to a final state, then the input string is accepted.

While it might seem likely that nondeterministic finite-state acceptors are more powerful than deterministic acceptors, the following theorem proves that such is not the case. The theorem also provides a method for converting an NFA to a DFA.

Theorem 4-1 Lettmp4C8-205_thumb[2][2]be a nondeterministic finite-state acceptor accepting a set of strings L. Define a DFAtmp4C8-206_thumb[2][2] as follows:

1. The alphabet of states consists of all the subsets of K. An element of K’ is denoted as [S1( S2,..., S1,], where Sv S2,…,SI are states of K. The states Sx, S2,…,SI are in the same canonical order, so that for some states in K, {S1,S2}(= {S2,S,}) is always S2].

2. The set of input characters VT is the same for F and F’.

3. The mapping M’ is defined as

tmp4C8-210_thumb[2][2]

where

tmp4C8-211_thumb[2][2]

4. If the starting state of F is S„ then S’ = [5,].

5. Z’ is the set of all states in K’ containing a final state in Z.

Then the set of strings accepted by F’ is the same as that for F.

Proof It must be shown that L(F’) = L(F) with the construction given in the theorem. Thus it is necessary to show that M’(S’, x) = P2,…, PJ is equivalent to M(S, x) = {Pv P2,…, P,}. This can be done by induction on the length of the string x. The result is trivial for |x| = 0, since x = e and S’ = [S] from the mapping M(S,e)= {5}.

Assume that M’(S’,x) is equivalent to M(S,x) fortmp4C8-212_thumb[2][2]Then for

tmp4C8-214_thumb[2][2]

By the inductive hypothesis

tmp4C8-215_thumb[2][2]

is equivalent to

tmp4C8-216_thumb[2][2]

By definition, we have

tmp4C8-217_thumb[2][2]

is equivalent to

tmp4C8-218_thumb[2][2]

Thus the equivalence holds for

tmp4C8-219_thumb[2][2]

and

Thus the equivalence holds for

tmp4C8-220_thumb[2][2]

is equivalent to

tmp4C8-221_thumb[2][2]

Thus

tmp4C8-222_thumb[2][2]

The construction given in the theorem may be applied to any NFA. For example, the NFAtmp4C8-223_thumb[2][2]where the mapping

function is defined in Table 4-4, can be converted to a DFA as follows: Define

tmp4C8-224_thumb[2][2]where VT is the same as for F and the starting state is tmp4C8-225_thumb[2][2]The states of K’ are initially defined astmp4C8-226_thumb[2][2]and the mapping M’ is readily obtained from the transition table for F. The mapping of these states is

tmp4C8-231_thumb[2][2]

A new state

tmp4C8-232_thumb[2][2]

is created and from the mapping of F

tmp4C8-233_thumb[2][2]

the mapping M’ is defined as

tmp4C8-234_thumb[2][2]

Similarly, the new state [<7lt q2] is added to K’ and the mappings

tmp4C8-235_thumb[2][2]

are defined. As no new states were created with the last mapping definitions, the creation of the DFA is nearly complete. The set of final states of F’, which is all that is left to be defined, are those states of K’ which contain an element of Z. In this example,

tmp4C8-236_thumb[2][2]

The notion of a nondeterministic finite-state acceptor is extended to include state transitions with the empty string as input in the next subsection. The next subsection also shews that such an acceptor is no more powerful than an acceptor which cannot change state when reading the empty string.

Nondeterministic Finite-State Acceptors with e-Transitions

In this subsection, nondeterministic finite-state acceptors are allowed to have state transitions with the empty string used as input. As might be expected, such an acceptor is no more powerful than the finite-state acceptors introduced in the previous subsections.

Table 4-4

State

Input

a

b

tmp4C8-237 tmp4C8-238 tmp4C8-239
tmp4C8-240 tmp4C8-241 tmp4C8-242
tmp4C8-243 tmp4C8-244 tmp4C8-245

 

A nondeterministic finite-state acceptor with f-transitions

Figure 4-8 A nondeterministic finite-state acceptor with f-transitions

Rather, it is a convenient construct for representing regular expressions.

The transition diagram for an NFA with e-transitions is given in Fig. 4-8. This machine accepts strings of the form 1"’0" where m, n > 0. After reading a sequence of Is from the input string, the acceptor changes to state q{, before reading the 0s which follow. This is done by reading the empty string which is found between the last 1 and the first 0.

Definition 4-4 A nondeterministic finite-state acceptor with e-transitions is a 5-tuple (K,Vt,M,S,Z), where K, VT, S, and Z are the same as in Definition 4-3 and M is the mappingtmp4C8-247_thumb[2][2]into subsets of K.

Using this definition, the NFA depicted in Fig. 4-8 can be defined as

tmp4C8-248_thumb[2][2]with the mapping function defined by Table 4-5.

Before the equivalence of an NFA with e-transitions and an NFA without e-transitions can be proved, it is necessary to define e-transitions and define the mapping function for M(q,x) where .tmp4C8-249_thumb[2][2]

Definition 4-5 e-CLOSURE is a set of states of an NFA with e-transitions, say F, such that e-CLOSURE for some state of F, call it q, includes all states attainable from that state by making state transitions with e-transitions. This is denoted by e-CLOSURE(^).

In the example of Fig. 4-8,tmp4C8-250_thumb[2][2]andtmp4C8-251_thumb[2][2]

tmp4C8-252_thumb[2][2]Notice thattmp4C8-253_thumb[2][2]for some state q invariably includes that state because a state can always be thought of as having an e-transition to itself. Furthermore,tmp4C8-254_thumb[2][2]for a set of states P. The definition oftmp4C8-255_thumb[2][2]allows the mapping function to be extended with the definitions:

Table 4-5

Input

State

0

1

E

tmp4C8-265 tmp4C8-266 tmp4C8-267 tmp4C8-268
tmp4C8-269 tmp4C8-270 tmp4C8-271 tmp4C8-272

 

tmp4C8-273_thumb[2][2]

The first definition states what the mapping of a state to a set of siates is when the empty string is read. In the second case, the mapping K X is defined, allowing the application of the mapping function to include strings.

It is now possible to show the equivalence of NFAs with e-transitions to those without. The proof of the following theorem gives the method for constructing an NFA without e-transitions.

Theorem 4-2 Definetmp4C8-274_thumb[2][2]as an NFA with e-transitions.

Then there exists an NFA F’ without e-transitions such that L(F) = L(F’). In other words, the languages accepted by the two acceptors are the same.

Proof Define an NFA F’ without e-transitions astmp4C8-276_thumb[2][2]where K, VT, and S are the same as for F and

tmp4C8-277_thumb[2][2] 

tmp4C8-278_thumb[2][2]

and M’(q,a) is M(q,a) fortmp4C8-280_thumb[2][2]where M is the mapping function M of F extended to strings. By induction on |x| it is necessary to show thattmp4C8-281_thumb[2][2]. Trivially iftmp4C8-282_thumb[2][2]thentmp4C8-283_thumb[2][2] S andtmp4C8-284_thumb[2][2]if a final state is

contained intmp4C8-285_thumb[2][2]

The inductive step is as follows. Assumetmp4C8-286_thumb[2][2] andtmp4C8-287_thumb[2][2]Then

tmp4C8-296_thumb[2][2]

We must show that M’(S, x) contains a final state if and only if M(S,x) contains a final state. It was previously shown that such is the case of x = e. Considertmp4C8-297_thumb[2][2]By the construction of F’, if

Table 4-6

Input

State

0 1

tmp4C8-299 tmp4C8-300
tmp4C8-301 tmp4C8-302

 

Figure 4-9

Figure 4-9

M(S, x) contains a final state, then so does M’(S. x). We must also show the converse; that is, M(S,x) contains a final state if M’(S,x) also does. Consider M’(S,x) containing a state of Z’ other than S. Then M(S,x) must contain a corresponding final state, say, Z. This comes from the construction of F’. Furthermore, iftmp4C8-304_thumb[2][2]then we have a state in tmp4C8-305_thumb[2][2]and F also in M(S, x) becausetmp4C8-306_thumb[2][2]

tmp4C8-307_thumb[2][2]

Using the construction from the proof given above, an NFA without e-transitions can be constructed from the NFA given near the beginning of this subsection. The acceptortmp4C8-308_thumb[2][2]I has its mapping function as defined in Table 4-6. The transition diagram for F’ is given in Fig. 4-9.

This section has introduced three types of finite-state acceptors with each new acceptor appearing to be more powerful than the previous one. It was proven that the languages which can be accepted by each acceptor were the same, however. The next two sections demonstrate the equivalence of regular grammars and regular expressions to finite-state acceptors. Methods to convert from a regular grammar or a regular expression to a finite-state acceptor and vice versa are also described.

Next post:

Previous post: