Java Reference
In-Depth Information
Character Classes
The metacharacters
[
and
]
(left and right brackets) are used to specify a character class inside a regular expression.
A character class is a set of characters. The regular expression engine will attempt to match one character from the
set. Note that a character class has no relation with a class construct or class files in Java. The character class
"[ABC]"
will match characters
A
,
B
, or
C
. For example, the strings
"A@V"
,
"B@V"
, and
"C@V"
will match the regular expression
"[ABC]@."
. However, the string
"H@V"
will not match the regular expression
"[ABC]@."
because
@
is not preceded by
A
,
B
, or
C
. As another example, the strings
"man"
or
"men"
will match the regular expression
"m[ae]n"
.
When I use the word “match”, I mean that the pattern exists in a string. I do not mean that the whole string
matches the pattern. For example,
"WEB@JDOJO.COM"
matches the pattern
"[ABC]@."
, because
@
is preceded by
B
.
The string
"A@BAND@YEA@U"
matches the pattern
"[ABC]@."
twice even though the string contains three
@
signs.
The second
@
is not a part of any match, because it is preceded by
D
and not
A
,
B
, or
C
.
You can also specify a range of characters using a character class. The range is expressed using a hyphen (
-
)
character. For example,
"[A-Z]"
in a regular expression represents any uppercase English letters;
"[0-9]"
represents
any digit between
0
and
9
. If you use
^
in the beginning of a character class, it means complement (meaning not). For
example,
"[^ABC]"
means any character except
A
,
B
, and
C
. The character class
"[^A-Z]"
represents any character
except uppercase English letters. If you use
^
anywhere in a character class except in the beginning, it loses its special
meaning (i.e. the special meaning of complement) and it matches just a
^
character. For example,
"[ABC^]"
will match
A
,
B
,
C
, or
^
.
You can also include two or more ranges in one character class. For example,
"[a-zA-Z]"
matches any character
a
through
z
and
A
through
Z
.
"[a-zA-Z0-9]"
matches any character
a
through
z
(uppercase and lowercase), and any
digit
0
through
9
. Some examples of character classes are listed in Table
14-1
.
Table 14-1.
Examples of Character Classes
Character Classes
Meaning
Category
[abc]
Character
a
,
b
, or
c
Simple character class
[^xyz]
A character except
x
,
y
, and
z
Complement or negation
[a-p]
Characters
a
through
p
Range
[a-cx-z]
Characters
a
through
c
, or
x
through
z
, which would
include
a
,
b
,
c
,
x
,
y
, or
z
.
Union
[0-9&&[4-8]]
Intersection of two ranges (
4
,
5
,
6
,
7
, or
8
)
Intersection
[a-z&&[^aeiou]]
All lowercase letters minus vowels. In other words, a lowercase
letter, which is not a vowel. That is, all lowercase consonants.
Subtraction
Predefined Character Classes
Some frequently used predefined character classes are listed in Table
14-2
.