Java Reference
In-Depth Information
[aeiou]
This is a simple class that any of the characters between the square brackets will
match - in this example, any vowel. We used this form in the code fragment above
to search for variations on "
had
".
[^aeiou]
This represents any character except those appearing to the right of the
^
character
between the square brackets. Thus here we have specified any character that is not
a vowel. Note this is any
character
, not any letter, so the expression "
h[^aeiou]d
"
will look for "
h!d
" or "
h9d
" as well as "
hxd
" or "
hWd
". Of course, it will reject
"
had
" or "
hid
" or any other form with a vowel as the middle letter.
[a-e]
This defines an inclusive range - any of the letters '
a
' to '
e
' in this case. You can
also specify multiple ranges, for example:
[a-cs-zA-E]
This corresponds to any of the characters from '
a
' to '
c
', from '
s
' to '
z
', or from '
A
'
to '
E
'.
If you want to specify that a position must contain a digit you could use
[0-9]
. To
specify that a position can be a letter or a digit you could express it as
[a-zA-Z0-9]
.
Any of these can be used in combination with ordinary characters to form a regular expression. For
example, suppose we wanted to search some text for any sequence beginning with '
b
', '
c
', or '
d
', with '
a
'
as the middle letter, and ending with '
d
' or '
t
'. The regular expression to do this could be defined as:
String regEx = "[b-d]a[dt]";
This will search for any occurrence of "
bad
", "
cad
", "
dad
", "
bat
", "
cat
", or "
dat
".
Logical Operators in Regular Expressions
You can use the
&&
operator to combine classes that define sets of characters. This is particularly useful
when it is combined with the negation operator,
^
, that appears in the second line of the table above.
For instance, if you want to specify that any lower case consonant is acceptable, you could write it as:
"[b-df-hj-np-tv-z]"
However, it can much more conveniently be expressed as:
[a-z&&[^aeiou]]
This produces the intersection (in other words the characters common to both sets) of the set of
characters '
a
' through '
z
' with the set that is not a lower case vowel. To put it another way, the lower
case vowels are subtracted from the set '
a
' through '
z
' so we are left with just the consonants.
The
|
operator is a logical OR that you use to specify alternatives. A regular expression to find "
hid
",
"
had
", or "
hod
" could be written as "
hid|had|hod
". You can try this in the previous example by
changing the definition of
regEx
to: