Java Reference
In-Depth Information
characters that starts with "h" and ends with "d" . Try changing the definitions of regEx and str in the pre-
vious example to
String regEx = "h.d";
String str = "Ted and Ned Hodge hid their hod and huddled in the hedge.";
If you recompile and run the example again, the last two lines of output are
Ted and Ned Hodge hid their hod and huddled in the hedge.
^^^ ^^^ ^^^ ^^^
You can see that you didn't find "Hod" in Hodge because of the capital "H" , but you found all the other
3-letter sequences beginning with "h" and ending with "d" .
Of course, the regular expression "h.d" would also have found "hzd" or "hNd" if they had been present,
which is not what you want. You can limit the possibilities by replacing the period with just the collection
of characters you are looking for between square brackets, like this:
String regEx = "h[aio]d";
The [aio] sequence of meta-characters defines what is called a simple class of characters, consisting in
this case of "a" , "i" , or "o" . Here the term class is used in the sense of a set of characters, not a class that
defines a type. If you try this version of the regular expression in the previous example, the last two lines of
output are:
Ted and Ned Hodge hid their hod and huddled in the hedge.
^^^ ^^^
The regular expression now matches all 3-letter sequences that begin with "h" and end with "d" and have
a middle letter of "a" or "i" or "o" .
You can define character classes in a regular expression in a variety of ways. Table 15-5 gives some ex-
amples of the more useful forms:
TABLE 15-5 : Character Classes in a Regular Expression
CLASS DESCRIPTION
[aeiou] This is a simple class that any of the characters between the square brackets match — in this example, any
lowercase vowel. You used this form in the earlier code fragment to search for variations on "had" .
[^aeiou] This represents any character except those appearing to the right of the ^ character between the square
brackets. Thus, here you have specified any character that is not a lowercase vowel. Note this is any charac-
ter , not any letter, so the expression "h[^aeiou]d" looks for "h!d" or "h9d" as well as "hxd" or "hWd" . Of
course, it rejects "had" or "hid" or any other form with a lowercase vowel as the middle letter.
This defines an inclusive range — any of the letters "a" to "e" in this case. You can also specify multiple
ranges. For example:
[a-cs-zA-E]
This corresponds to any of the characters from "a" to "c" , from "s" to "z" , or from "A" to "E" .
If you want to specify that a position must contain a digit, you could use [0-9] . To specify that a position
can be a letter or a digit, you could express it as [a-zA-Z0-9] .
[a-e]
You can use any of these in combination with ordinary characters to form a regular expression. For ex-
ample, suppose you want to search some text for any sequence beginning with "b" , "c" , or "d" , with "a" as
the middle letter, and ending with "d" or "t" . You could define the regular expression to do this as:
 
 
Search WWH ::




Custom Search