Information Technology Reference
In-Depth Information
with the smallest number of states is called the
state minimization problem
and is explored
in Section
4.7
.
4.3 Regular Expressions
In this section we introduce regular expressions, algebraic expressions over sets of individual
letters that describe the class of languages recognized by finite-state machines, as shown in the
next section.
Regular expressions are formed through the concatenation, union, and Kleene closure of
sets of strings. Given two sets of strings
L
1
and
L
2
,their
concatenation
L
1
·
L
2
is the set
{
; that is, the set of strings consisting of an arbitrary string of
L
1
followed by an arbitrary string of
L
2
. (We often omit the concatenation operator
uv
|
u
∈
L
1
and
v
∈
L
2
}
·
,writing
variables one after the other instead.) The
union
of
L
1
and
L
2
, denoted
L
1
∪
L
2
,istheset
of strings that are in
L
1
or
L
2
or both. The
Kleene closure
of a set
L
of strings, denoted
L
∗
(also called the
Kleene star
), is defined in terms of the
i
-fold concatenation of
L
with itself,
namely,
L
i
=
L
L
i−
1
,where
L
0
=
·
{
}
, the set containing the empty string:
L
∗
=
∞
L
i
i
=
0
Thus,
L
∗
is the union of strings formed by concatenating zero or more words of
L
. Finally, we
define the
positive closure
of
L
to be the union of all
i
-fold products except for the zeroth,
that is,
L
+
=
∞
L
i
i
=
1
The positive closure is a useful shorthand in regular expressions.
An example is helpful. Let
L
1
=
{
}
and
L
2
=
{
0,
aba
}
;then
L
1
L
2
=
{
010, 01
aba
,
01, 11
110, 11
aba
}
,
L
1
∪
L
2
=
{
0, 01, 11,
aba
}
,and
L
2
=
}
∗
=
{
0,
aba
{
,0,
aba
, 00, 0
aba
,
aba
0,
abaaba
,
...
}
Note that the definition given earlier for
Σ
∗
, namely, the set of strings over the finite alphabet
Σ
, coincides with this new definition of the Kleene closure. We are now prepared to define
regular expressions.
DEFINITION
4.3.1
Regular expressions
over the finite alphabet
Σ
and the languages they de-
scribe are defined recursively as follows:
1.
∅
is a regular expression denoting the empty set.
2.
is a regular expression denoting the set
{
}
.
3. For each letter
a
∈
Σ
,
a
is a regular expression denoting the set
{
a
}
containing
a
.
4. If
r
and
s
are regular expressions denoting the languages
R
and
S
,then
(
rs
)
,
(
r
+
s
)
,and
(
r
∗
)
are regular expressions denoting the languages
R
S
,and
R
∗
, respectively.
·
S
,
R
∪
The languages denoted by regular expressions are called
regular languages
. (They are also often
called regular sets.)
Search WWH ::
Custom Search