Java Reference
In-Depth Information
String regEx = "hid|had|hod";
Note that the | operation means either the whole expression to the left of the operator or the whole
expression to the right, not just the characters on either side as alternatives.
You could also use the | operator to define an expression to find sequences beginning with an upper
case or lower case ' h ', followed by a vowel, and ending in ' d ', like this:
String regEx = "[h|H][aeiou]d";
With this as the regular expression in the example, the " Hod " in Hodge will be found as well as the
other variations.
Predefined Character Sets
There are also a number of predefined character classes that provide you with a shorthand notation for
commonly used sets of characters. Here are some that are particularly useful:
.
This represents any character, as we have already seen.
\d
This represents any digit and is therefore shorthand for [0-9].
\D
This represents any character that is not a digit. It is therefore equivalent to [^0-9].
\s
This represents any whitespace character.
\S
This represents any non-whitespace character and is therefore equivalent to [^\s].
\w
This represents a word character, which corresponds to an upper or lower case letter
or a digit or an underscore. It is therefore equivalent to [a-zA-Z_0-9].
\W
This represents any character that is not a word character so it is equivalent to [^\w].
Note that when you are including any of the sequences that start with a backslash in a regular
expression, you need to keep in mind that Java treats a backslash as the beginning of an escape
sequence. You must therefore specify the backslash in the regular expression as \\ . For instance, to find
a sequence of three digits, the regular expression would be " \\d\\d\\d ". This is peculiar to Java
because of the significance of the backslash in Java strings, so it doesn't apply to other environments
that support regular expressions, such as Perl.
Obviously you may well want to include a period, or any of the other meta-characters, as part of the
character sequence you are looking for. To do this you can use an escape sequence starting with a
backslash in the expression to define such characters. Since Java strings interpret a backslash as the start
of a Java escape sequence, the backslash itself has to be represented as \\ , the same as when using the
predefined characters sets that begin with a backslash. Thus the regular expression to find the sequence
" had. " would be " had\\. ".
Our earlier search with the expression " h.d " found embedded sequences such as " hud " in the word
huddled . We could use the \s set that corresponds to any whitespace character to prevent this by
defining regEx like this:
Search WWH ::




Custom Search