Java Reference
In-Depth Information
Character Classes
The metacharacters [ and ] (left and right brackets) are used to specify a character class inside a regular expression.
A character class is a set of characters. The regular expression engine will attempt to match one character from the
set. Note that a character class has no relation with a class construct or class files in Java. The character class "[ABC]"
will match characters A , B , or C . For example, the strings "A@V" , "B@V" , and "C@V" will match the regular expression
"[ABC]@." . However, the string "H@V" will not match the regular expression "[ABC]@." because @ is not preceded by A ,
B , or C . As another example, the strings "man" or "men" will match the regular expression "m[ae]n" .
When I use the word “match”, I mean that the pattern exists in a string. I do not mean that the whole string
matches the pattern. For example, "WEB@JDOJO.COM" matches the pattern "[ABC]@." , because @ is preceded by B .
The string "A@BAND@YEA@U" matches the pattern "[ABC]@." twice even though the string contains three @ signs.
The second @ is not a part of any match, because it is preceded by D and not A , B , or C .
You can also specify a range of characters using a character class. The range is expressed using a hyphen ( - )
character. For example, "[A-Z]" in a regular expression represents any uppercase English letters; "[0-9]" represents
any digit between 0 and 9 . If you use ^ in the beginning of a character class, it means complement (meaning not). For
example, "[^ABC]" means any character except A , B , and C . The character class "[^A-Z]" represents any character
except uppercase English letters. If you use ^ anywhere in a character class except in the beginning, it loses its special
meaning (i.e. the special meaning of complement) and it matches just a ^ character. For example, "[ABC^]" will match
A , B , C , or ^ .
You can also include two or more ranges in one character class. For example, "[a-zA-Z]" matches any character
a through z and A through Z . "[a-zA-Z0-9]" matches any character a through z (uppercase and lowercase), and any
digit 0 through 9 . Some examples of character classes are listed in Table 14-1 .
Table 14-1. Examples of Character Classes
Character Classes
Meaning
Category
[abc]
Character a , b , or c
Simple character class
[^xyz]
A character except x , y , and z
Complement or negation
[a-p]
Characters a through p
Range
[a-cx-z]
Characters a through c , or x through z , which would
include a , b , c , x , y , or z .
Union
[0-9&&[4-8]]
Intersection of two ranges ( 4 , 5 , 6 , 7 , or 8 )
Intersection
[a-z&&[^aeiou]]
All lowercase letters minus vowels. In other words, a lowercase
letter, which is not a vowel. That is, all lowercase consonants.
Subtraction
Predefined Character Classes
Some frequently used predefined character classes are listed in Table 14-2 .
 
Search WWH ::




Custom Search