Java Reference
In-Depth Information
String regEx = "[b-d]a[dt]";
This expression matches any occurrence of "bad" , "cad" , "dad" , "bat" , "cat" , or "dat" .
Logical Operators in Regular Expressions
You can use the && operator to combine classes that define sets of characters. This is particularly useful
when you use it combined with the negation operator, ^ , that appears in the second row of the table in the
preceding section. For example, if you want to specify that any lowercase consonant is acceptable, you could
write the expression that matches this as:
"[b-df-hj-np-tv-z]"
However, this can much more conveniently be expressed as the following pattern:
"[a-z&&[^aeiou]]"
This produces the intersection (in other words, the characters common to both sets) of the set of charac-
ters "a" through "z" with the set that is not a lowercase vowel. To put it another way, the lowercase vowels
are subtracted from the set "a" through "z" so you are left with just the lowercase consonants.
The | operator is a logical OR that you use to specify alternatives. A regular expression to find "hid" ,
"had" , or "hod" could be written as "hid|had|hod" . You can try this in the previous example by changing
the definition of regEx to:
String regEx = "hid|had|hod";
Note that the | operation means either the whole expression to the left of the operator or the whole ex-
pression to the right, not just the characters on either side as alternatives.
You could also use the | operator to define an expression to find sequences beginning with an uppercase
or lowercase "h" , followed by a lowercase vowel, and ending in "d" , like this:
String regEx = "[h|H][aeiou]d";
The first pair of square brackets encloses the choice of "h" or "H" . The second pair of square brackets
determines that the next character is any lowercase vowel. The last character must always be "d" . With this
as the regular expression in the example, the "Hod" in Hodge is found as well as the other variations.
Predefined Character Sets
You also have a number of predefined character classes that provide you with a shorthand notation for
commonly used sets of characters. Table 15-6 gives some that are particularly useful:
TABLE 15-6 : Predefined Character Classes
CHARACTER
CLASS
DESCRIPTION
This represents any character, as you have already seen.
.
This represents any digit and is therefore shorthand for [0-9] .
\d
This represents any character that is not a digit. It is therefore equivalent to [^0-9] .
\D
 
 
Search WWH ::




Custom Search