Java Reference
In-Depth Information
Character Class
Set of Characters Denoted
[abc]
Three characters: a, b,andc
[cba]
Three characters: a, b,andc
[a-c]
Three characters: a, b,andc
[aabbcc]
Three characters: a, b,andc
[ˆabc]
All characters except a, b,andc
[
\
ˆ
\
-
\
]]
Three characters: ˆ, -,and]
[ˆ]
All characters
"[abc]"
Not a character class. This is
an example of one five-character
string :[abc].
Figure 3.9: Lex character class definitions.
A character class is delimited by [ and ]; individual characters are cate-
nated without any quotation or separators. However
, ˆ, ],and- must be
escaped because of their reserved meanings (see below) in character classes.
Thus [xyz] represents the class that canmatch a singlex,y,orz. The expression
[
\
])] represents the class that can match a single ] or ).The] is escaped so
that it is not misinterpreted as the end-of-character-class symbol.
\
Ranges of characters are separated by a -; for example, [x-z] is the same
as [xyz]. [0-9] is the set of all digits, and [a-zA-Z] is the set of all letters,
both uppercase and lowercase.
is the escape character; it is used to represent
unprintables and to escape special symbols. Following C conventions,
\
\
n is
the newline (that is, end-of-line),
\
t is the tab character,
\\
is the backslash
symbol itself, and
\
010 is the character corresponding to 10 in octal (base 8)
form.
The ˆ symbol complements a character class; it is Lex's representation of
the Not() operation. For example, [ˆxy]is the character class that matches any
single character except x and y.Theˆ symbol applies to all characters that
follow it in the character class definition, so [ˆ0-9] is the set of all characters
that are not digits. [ˆ] can be used to match all characters. (Avoid the use
of
0 in character classes because it can be confused with the null character's
special use as the end-of-string terminator in C.) Figure 3.9 illustrates various
character classes and the character sets they define.
\
Using character classes, we can easily define ac identifiers, as shown in
Figure 3.10. The character class includes the charactersa throughe, gandh, j
througho, and finallyqthroughz. We can concisely represent the 23 characters
that may form an ac identifier without having to enumerate them all.
Search WWH ::




Custom Search