Java Reference
In-Depth Information
• λ
is a regular expression denoting the set that contains only the empty
string. This set is not the same as the empty set because it does contain
one element.
The symbol s is a regular expression denoting
{ s }
: a set containing the
single symbol s ∈Σ
.
If A and B are regular expressions, then A | B , AB ,and A are also regular
expressions. They denote, respectively, the alternation, catenation, and
Kleene closure of the corresponding regular sets.
Each regular expression denotes a regular set. Any finite set of strings can be
represented by a regular expression of the form ( s 1
| s 2
| ... | s k
). Thus, the
reserved words of ANSI C can be defined as (auto
).
The following additional operations are also useful. They are not strictly
necessary because their e
|
break
|
case
|...
ect can be obtained (perhaps somewhat clumsily)
using the three standard regular operators (alternation, catenation, and Kleene
closure):
ff
P + , sometimes called positive closure , denotes all strings consisting of
one or more strings in P catenated together: P =
( P +
)and P + = PP .
1) + is the set of all strings containing one
For example, the expression (0
|
or more bits.
If A is a set of characters, Not(A) denotes (
Σ
- A ), that is, all characters
in
Σ
not included in A .SinceNot(A) can never be larger than
Σ
and
Σ
is finite, Not(A) must also be finite. Therefore it is regular. Not(A)
does not contain
is not a character (it is a zero-length string).
As an example, Not(Eol) is the set of all characters excluding Eol (the
end-of-line character; in Java or C,
λ
because
λ
n).
It is possible to extend Not() to strings, rather than just
\
Σ
.If S is a set of
Σ - S ), that is, the set of all strings
except those in S .AlthoughNot( S ) is usually infinite, it also is regular if
S is regular (Exercise 18).
strings, we can define Not( S )tobe(
If k is a constant, then the set A k represents all strings formed by catenat-
ing k (possibly di
erent) strings from A .Thatis, A k =
ff
( AAA ...) ( k copies).
1) 32 is the set of all bit strings exactly 32 bits long.
Thus, (0
|
3.3 Examples
We next provide some examples that use regular expressions to specify some
common programming language tokens. In these definitions, D is the set of
the ten single digits and L is the set of all upper- and lower-case letters.
 
 
Search WWH ::




Custom Search