Java Reference
In-Depth Information
String regEx = "\\sh.d\\s";
This searches for a five-character sequence that starts and ends with any whitespace character. The
output from the example will now be:
Ted and Ned Hodge hid their hod and huddled in the hedge.
^^^^^ ^^^^^
You can see that the marker array shows the five-character sequences that were found. The embedded
sequences are now no longer included, as they don't begin and end with a whitespace character.
To take another example, suppose we want to find hedge or Hodge as words in the sentence, bearing in
mind that there's a period at the end. We could do this by defining the regular expression as:
String regEx = "\\s[h|H][e|o]dge[\\s|\\.]";
The first character is defined as any whitespace by \\s . The next character is defined as either 'h' or 'H'
by [h|H] . This can be followed by either 'e' or 'o' specified by [e|o] . This is followed by plain text
dge with either a whitespace character or a period at the end, specified by [\\s|\\.] . This doesn't
cater for all possibilities. Sequences at the beginning of the string will not be found, for instance, nor will
sequences followed by a comma. We'll see how to deal with these next.
Matching Boundaries
So far we have tried to find the occurrence of a pattern anywhere in a string. In many situations you will
want to be more specific. You may want to look for a pattern that appears at the beginning of a line in a
string but not anywhere else, or maybe just at the end of any line. As we saw in the previous example
you may want to look for a word that is not embedded - you want to find the word " cat " but not the
" cat " in " cattle " or in " Popacatapetl " for instance. The previous example worked for the string
we were searching but would not produce the right result if the word we were looking for was followed
by a comma or appeared at the end of the text. However, we have other options. There are a number of
special sequences you can use in a regular expression when you want to match a particular boundary.
For instance, these are especially useful:
^
Specifies the beginning of a line. For example, to find the word Java at the beginning of
any line you could use the expression " ^Java ".
$
Specifies the end of a line. For example, to find the word Java at the end of any line you
could use the expression " Java$ ". Of course, if you were expecting a period at the end of
a line the expression would be " Java\\.$ ".
\b
Specifies a word boundary. To find words beginning with ' h ' and ending with ' d ' we could
use the expression " \\bh.d\\b ".
\B
A non-word boundary - the complement of \b above.
Search WWH ::




Custom Search