A Collection of Useful Classes - Beginning Java

Java Reference

In-Depth Information

CHARACTER

CLASS

DESCRIPTION

This represents any whitespace character. A whitespace character is a space, a tab '\t' , a newline

character '\n' , a form feed character '\f' , a carriage return '\r' , or a page break 'x0B' .

\s

This represents any non-whitespace character and is therefore equivalent to [^\s] .

\S

This represents a word character, which corresponds to an upper- or lowercase letter, a digit, or an un-

derscore. It is therefore equivalent to [a-zA-Z_0-9] .

\w

This represents any character that is not a word character, so it is equivalent to [^\w] .

\W

Note that when you are using any of the sequences that start with a backslash in a regular expression, you

need to keep in mind that Java treats a backslash as the beginning of an escape sequence. Therefore, you

must specify the backslash in the regular expression as \\ . For example, to find a sequence of three digits,

the regular expression would be "\\d\\d\\d" . This is peculiar to Java because of the significance of the

backslash in Java strings, so it doesn't necessarily apply to other environments that support regular expres-

sions, such as Perl.

Obviously, you may well want to include a period, or any of the other meta-characters, as part of the

character sequence you are looking for. To do this you can use an escape sequence starting with a backslash

in the expression to define such characters. Because Java strings interpret a backslash as the start of a Java

escape sequence, the backslash itself has to be represented as \\ , the same as when using the predefined

character sets that begin with a backslash. Thus, the regular expression to find the sequence "had." would

be "had\\." .

The earlier search you tried with the expression "h.d" found embedded sequences such as "hud" in the

word huddled . You could use the \s set that corresponds to any whitespace character to prevent this by de-

fining regEx like this:

String regEx = "\\sh.d\\s";

This searches for a five-character sequence that starts and ends with any whitespace character. The output

from the example is now:

Ted and Ned Hodge hid their hod and huddled in the hedge.

^^^^^ ^^^^^

You can see that the marker array shows the five-character sequences that were found. The embedded

sequences are now no longer included, as they don't begin and end with a whitespace character.

To take another example, suppose you want to find hedge or Hodge as words in the sentence, bearing in

mind that there's a period at the end. You could do this by defining the regular expression as:

String regEx = "\\s[h|H][e|o]dge[\\s|\\.]";

The first character is defined as any whitespace by \\s . The next character is defined as either "h" or

"H" by [h|H] . This can be followed by either "e" or "o" specified by [e|o] . This is followed by plaintext

dge with either a whitespace character or a period at the end, specified by [\\s|\\.] . This doesn't cater

to all possibilities. Sequences at the beginning of the string are not found, for example, nor are sequences

followed by a comma. You see how to deal with these next.

Matching Boundaries

Beginning Java

Search WWH ::

Custom Search

Home