Java Reference
In-Depth Information
CHARACTER
CLASS
DESCRIPTION
This represents any whitespace character. A whitespace character is a space, a tab '\t' , a newline
character '\n' , a form feed character '\f' , a carriage return '\r' , or a page break 'x0B' .
\s
This represents any non-whitespace character and is therefore equivalent to [^\s] .
\S
This represents a word character, which corresponds to an upper- or lowercase letter, a digit, or an un-
derscore. It is therefore equivalent to [a-zA-Z_0-9] .
\w
This represents any character that is not a word character, so it is equivalent to [^\w] .
\W
Note that when you are using any of the sequences that start with a backslash in a regular expression, you
need to keep in mind that Java treats a backslash as the beginning of an escape sequence. Therefore, you
must specify the backslash in the regular expression as \\ . For example, to find a sequence of three digits,
the regular expression would be "\\d\\d\\d" . This is peculiar to Java because of the significance of the
backslash in Java strings, so it doesn't necessarily apply to other environments that support regular expres-
sions, such as Perl.
Obviously, you may well want to include a period, or any of the other meta-characters, as part of the
character sequence you are looking for. To do this you can use an escape sequence starting with a backslash
in the expression to define such characters. Because Java strings interpret a backslash as the start of a Java
escape sequence, the backslash itself has to be represented as \\ , the same as when using the predefined
character sets that begin with a backslash. Thus, the regular expression to find the sequence "had." would
be "had\\." .
The earlier search you tried with the expression "h.d" found embedded sequences such as "hud" in the
word huddled . You could use the \s set that corresponds to any whitespace character to prevent this by de-
fining regEx like this:
String regEx = "\\sh.d\\s";
This searches for a five-character sequence that starts and ends with any whitespace character. The output
from the example is now:
Ted and Ned Hodge hid their hod and huddled in the hedge.
^^^^^ ^^^^^
You can see that the marker array shows the five-character sequences that were found. The embedded
sequences are now no longer included, as they don't begin and end with a whitespace character.
To take another example, suppose you want to find hedge or Hodge as words in the sentence, bearing in
mind that there's a period at the end. You could do this by defining the regular expression as:
String regEx = "\\s[h|H][e|o]dge[\\s|\\.]";
The first character is defined as any whitespace by \\s . The next character is defined as either "h" or
"H" by [h|H] . This can be followed by either "e" or "o" specified by [e|o] . This is followed by plaintext
dge with either a whitespace character or a period at the end, specified by [\\s|\\.] . This doesn't cater
to all possibilities. Sequences at the beginning of the string are not found, for example, nor are sequences
followed by a comma. You see how to deal with these next.
Matching Boundaries
Search WWH ::




Custom Search