Java Reference
In-Depth Information
Defining Sets of Characters
A regular expression can be made up of ordinary characters, which are upper and lower case letters and
digits, plus sequences of meta-characters that have a special meaning. The pattern in the previous
example was just the word " had ", but what if we wanted to search a string for occurrences of " hid " or
" hod " as well as " had ", or even any three letter word beginning with ' h ' and ending with ' d '?
You can deal with any of these possibilities with regular expressions. One option is to specify the
middle character as a wildcard by using a period here, which is one example of a meta-character. This
meta-character matches any character except end-of-line, so the regular expression " h.d ", represents
any sequence of three characters that start with ' h ' and end with ' d '. Try changing the definitions of
regEx and str in the previous example to:
String regEx = "h.d";
String str = "Ted and Ned Hodge hid their hod and huddled in the hedge.";
If you recompile and run the example again, the last two lines of output will be:
Ted and Ned Hodge hid their hod and huddled in the hedge.
^^^ ^^^ ^^^ ^^^
You can see that we didn't find " Hod " in Hodge because of the capital ' H ' but we found all the other
sequences beginning with ' h ' and ending with ' d '.
Of course, the regular expression " h.d " would also have found " hzd " or " hNd " if they had been
present, which is not what we want. We can limit the possibilities by replacing the period with just the
collection of characters we are looking for between square brackets, like this:
String regEx = "h[aio]d";
The [aio] sequence of meta-characters defines what is called a simple class of characters consisting in
this case of ' a ', ' i ', or ' o '. Here the term 'class' is used in the sense of a set of characters, not a class that
defines a type. If you try this version of the regular expression in the previous example, the last two
lines of output will be:
Ted and Ned Hodge hid their hod and huddled in the hedge.
^^^ ^^^
This now finds all sequences that begin with ' h ' and end with ' d ' and have a middle letter as ' a ' or ' i ' or ' o '.
There are a variety of ways in which you can define character classes in a regular expression. Here are
some examples of the more useful forms:
Search WWH ::




Custom Search