Finite-State Machines and Pushdown Automata - Models of Computation: Exploring the Power of Computing

Information Technology Reference

In-Depth Information

with the smallest number of states is called the state minimization problem and is explored

in Section 4.7 .

4.3 Regular Expressions

In this section we introduce regular expressions, algebraic expressions over sets of individual

letters that describe the class of languages recognized by finite-state machines, as shown in the

next section.

Regular expressions are formed through the concatenation, union, and Kleene closure of

sets of strings. Given two sets of strings L 1 and L 2 ,their concatenation L 1 ·

L 2 is the set

{

; that is, the set of strings consisting of an arbitrary string of L 1

followed by an arbitrary string of L 2 . (We often omit the concatenation operator

∈

L 1 and v

∈

L 2 }

,writing

variables one after the other instead.) The union of L 1 and L 2 , denoted L 1 ∪

L 2 ,istheset

of strings that are in L 1 or L 2 or both. The Kleene closure of a set L of strings, denoted L ∗

(also called the Kleene star ), is defined in terms of the i -fold concatenation of L with itself,

namely, L i = L

L i− 1 ,where L 0 =

{

}

, the set containing the empty string:

L ∗ = ∞

L i

i = 0

Thus, L ∗ is the union of strings formed by concatenating zero or more words of L . Finally, we

define the positive closure of L to be the union of all i -fold products except for the zeroth,

that is,

L + = ∞

L i

i = 1

The positive closure is a useful shorthand in regular expressions.

An example is helpful. Let L 1 =

{

}

and L 2 =

{

0, aba

}

;then L 1 L 2 =

{

010, 01 aba ,

01, 11

110, 11 aba

}

, L 1 ∪

L 2 =

{

0, 01, 11, aba

}

,and

L 2 =

} ∗ =

{

0, aba

{

,0, aba , 00, 0 aba , aba 0, abaaba , ...

}

Note that the definition given earlier for Σ ∗ , namely, the set of strings over the finite alphabet

Σ , coincides with this new definition of the Kleene closure. We are now prepared to define

regular expressions.

DEFINITION 4.3.1 Regular expressions over the finite alphabet Σ and the languages they de-

scribe are defined recursively as follows:

∅

is a regular expression denoting the empty set.

2. is a regular expression denoting the set

{

}

3. For each letter a

∈

Σ , a is a regular expression denoting the set

{

}

containing a .

4. If r and s are regular expressions denoting the languages R and S ,then ( rs ) , ( r + s ) ,and

( r ∗ ) are regular expressions denoting the languages R

S ,and R ∗ , respectively.

S , R

∪

The languages denoted by regular expressions are called regular languages . (They are also often

called regular sets.)

Models of Computation: Exploring the Power of Computing

Search WWH ::

Custom Search

Home