Information Technology Reference
In-Depth Information
with the smallest number of states is called the state minimization problem and is explored
in Section 4.7 .
4.3 Regular Expressions
In this section we introduce regular expressions, algebraic expressions over sets of individual
letters that describe the class of languages recognized by finite-state machines, as shown in the
next section.
Regular expressions are formed through the concatenation, union, and Kleene closure of
sets of strings. Given two sets of strings L 1 and L 2 ,their concatenation L 1 ·
L 2 is the set
{
; that is, the set of strings consisting of an arbitrary string of L 1
followed by an arbitrary string of L 2 . (We often omit the concatenation operator
uv
|
u
L 1 and v
L 2 }
·
,writing
variables one after the other instead.) The union of L 1 and L 2 , denoted L 1
L 2 ,istheset
of strings that are in L 1 or L 2 or both. The Kleene closure of a set L of strings, denoted L
(also called the Kleene star ), is defined in terms of the i -fold concatenation of L with itself,
namely, L i = L
L i− 1 ,where L 0 =
·
{
}
, the set containing the empty string:
L =
L i
i = 0
Thus, L is the union of strings formed by concatenating zero or more words of L . Finally, we
define the positive closure of L to be the union of all i -fold products except for the zeroth,
that is,
L + =
L i
i = 1
The positive closure is a useful shorthand in regular expressions.
An example is helpful. Let L 1 =
{
}
and L 2 =
{
0, aba
}
;then L 1 L 2 =
{
010, 01 aba ,
01, 11
110, 11 aba
}
, L 1
L 2 =
{
0, 01, 11, aba
}
,and
L 2 =
} =
{
0, aba
{
,0, aba , 00, 0 aba , aba 0, abaaba , ...
}
Note that the definition given earlier for Σ , namely, the set of strings over the finite alphabet
Σ , coincides with this new definition of the Kleene closure. We are now prepared to define
regular expressions.
DEFINITION 4.3.1 Regular expressions over the finite alphabet Σ and the languages they de-
scribe are defined recursively as follows:
1.
is a regular expression denoting the empty set.
2. is a regular expression denoting the set
{
}
.
3. For each letter a
Σ , a is a regular expression denoting the set
{
a
}
containing a .
4. If r and s are regular expressions denoting the languages R and S ,then ( rs ) , ( r + s ) ,and
( r ) are regular expressions denoting the languages R
S ,and R , respectively.
·
S , R
The languages denoted by regular expressions are called regular languages . (They are also often
called regular sets.)
Search WWH ::




Custom Search