Java Reference
In-Depth Information
[aeiou]
This is a simple class that any of the characters between the square brackets will
match - in this example, any vowel. We used this form in the code fragment above
to search for variations on " had ".
[^aeiou]
This represents any character except those appearing to the right of the ^ character
between the square brackets. Thus here we have specified any character that is not
a vowel. Note this is any character , not any letter, so the expression " h[^aeiou]d "
will look for " h!d " or " h9d " as well as " hxd " or " hWd ". Of course, it will reject
" had " or " hid " or any other form with a vowel as the middle letter.
[a-e]
This defines an inclusive range - any of the letters ' a ' to ' e ' in this case. You can
also specify multiple ranges, for example:
[a-cs-zA-E]
This corresponds to any of the characters from ' a ' to ' c ', from ' s ' to ' z ', or from ' A '
to ' E '.
If you want to specify that a position must contain a digit you could use [0-9] . To
specify that a position can be a letter or a digit you could express it as
[a-zA-Z0-9] .
Any of these can be used in combination with ordinary characters to form a regular expression. For
example, suppose we wanted to search some text for any sequence beginning with ' b ', ' c ', or ' d ', with ' a '
as the middle letter, and ending with ' d ' or ' t '. The regular expression to do this could be defined as:
String regEx = "[b-d]a[dt]";
This will search for any occurrence of " bad ", " cad ", " dad ", " bat ", " cat ", or " dat ".
Logical Operators in Regular Expressions
You can use the && operator to combine classes that define sets of characters. This is particularly useful
when it is combined with the negation operator, ^ , that appears in the second line of the table above.
For instance, if you want to specify that any lower case consonant is acceptable, you could write it as:
"[b-df-hj-np-tv-z]"
However, it can much more conveniently be expressed as:
[a-z&&[^aeiou]]
This produces the intersection (in other words the characters common to both sets) of the set of
characters ' a ' through ' z ' with the set that is not a lower case vowel. To put it another way, the lower
case vowels are subtracted from the set ' a ' through ' z ' so we are left with just the consonants.
The | operator is a logical OR that you use to specify alternatives. A regular expression to find " hid ",
" had ", or " hod " could be written as " hid|had|hod ". You can try this in the previous example by
changing the definition of regEx to:
Search WWH ::




Custom Search