Java Reference
In-Depth Information
• λ
is a regular expression denoting the set that contains only the empty
string. This set is not the same as the empty set because it does contain
one element.
•
The symbol
s
is a regular expression denoting
{
s
}
: a set containing the
single symbol
s
∈Σ
.
If
A
and
B
are regular expressions, then
A
|
B
,
AB
,and
A
are also regular
expressions. They denote, respectively, the alternation, catenation, and
Kleene closure of the corresponding regular sets.
•
Each regular expression denotes a regular set. Any finite set of strings can be
represented by a regular expression of the form (
s
1
|
s
2
| ... |
s
k
). Thus, the
reserved words of ANSI C can be defined as (auto
).
The following additional operations are also useful. They are not strictly
necessary because their e
|
break
|
case
|...
ect can be obtained (perhaps somewhat clumsily)
using the three standard regular operators (alternation, catenation, and Kleene
closure):
ff
•
P
+
, sometimes called
positive closure
, denotes all strings consisting of
one or more strings in
P
catenated together:
P
=
(
P
+
|λ
)and
P
+
=
PP
.
1)
+
is the set of all strings containing one
For example, the expression (0
|
or more bits.
•
If
A
is a set of characters, Not(A) denotes (
Σ
-
A
), that is, all
characters
in
Σ
not included in
A
.SinceNot(A) can never be larger than
Σ
and
Σ
is finite, Not(A) must also be finite. Therefore it is regular. Not(A)
does not contain
is not a character (it is a zero-length string).
As an example, Not(Eol) is the set of all characters excluding Eol (the
end-of-line character; in Java or C,
λ
because
λ
n).
It is possible to extend Not() to strings, rather than just
\
Σ
.If
S
is a set of
Σ
-
S
), that is, the set of all strings
except those in
S
.AlthoughNot(
S
) is usually infinite, it also is regular if
S
is regular (Exercise 18).
strings, we can define Not(
S
)tobe(
If
k
is a constant, then the set
A
k
represents all strings formed by catenat-
ing
k
(possibly di
•
erent) strings from
A
.Thatis,
A
k
=
ff
(
AAA
...) (
k
copies).
1)
32
is the set of all bit strings exactly 32 bits long.
Thus, (0
|
We next provide some examples that use regular expressions to specify some
common programming language tokens. In these definitions,
D
is the set of
the ten single digits and
L
is the set of all upper- and lower-case letters.