A Collection of Useful Classes - Beginning Java 2 SDK

Java Reference

In-Depth Information

String regEx = "\\sh.d\\s";

This searches for a five-character sequence that starts and ends with any whitespace character. The

output from the example will now be:

Ted and Ned Hodge hid their hod and huddled in the hedge.

^^^^^ ^^^^^

You can see that the marker array shows the five-character sequences that were found. The embedded

sequences are now no longer included, as they don't begin and end with a whitespace character.

To take another example, suppose we want to find hedge or Hodge as words in the sentence, bearing in

mind that there's a period at the end. We could do this by defining the regular expression as:

String regEx = "\\s[h|H][e|o]dge[\\s|\\.]";

The first character is defined as any whitespace by \\s . The next character is defined as either 'h' or 'H'

by [h|H] . This can be followed by either 'e' or 'o' specified by [e|o] . This is followed by plain text

dge with either a whitespace character or a period at the end, specified by [\\s|\\.] . This doesn't

cater for all possibilities. Sequences at the beginning of the string will not be found, for instance, nor will

sequences followed by a comma. We'll see how to deal with these next.

Matching Boundaries

So far we have tried to find the occurrence of a pattern anywhere in a string. In many situations you will

want to be more specific. You may want to look for a pattern that appears at the beginning of a line in a

string but not anywhere else, or maybe just at the end of any line. As we saw in the previous example

you may want to look for a word that is not embedded - you want to find the word " cat " but not the

" cat " in " cattle " or in " Popacatapetl " for instance. The previous example worked for the string

we were searching but would not produce the right result if the word we were looking for was followed

by a comma or appeared at the end of the text. However, we have other options. There are a number of

special sequences you can use in a regular expression when you want to match a particular boundary.

For instance, these are especially useful:

^

Specifies the beginning of a line. For example, to find the word Java at the beginning of

any line you could use the expression " ^Java ".

$

Specifies the end of a line. For example, to find the word Java at the end of any line you

could use the expression " Java$ ". Of course, if you were expecting a period at the end of

a line the expression would be " Java\\.$ ".

\b

Specifies a word boundary. To find words beginning with ' h ' and ending with ' d ' we could

use the expression " \\bh.d\\b ".

\B

A non-word boundary - the complement of \b above.

Search WWH ::

Custom Search

Home