Java Reference
In-Depth Information
So far you have been trying to find the occurrence of a pattern anywhere in a string. In many situations you
will want to be more specific. You may want to look for a pattern that appears at the beginning of a line in a
string but not anywhere else, or maybe just at the end of any line. As you saw in the previous example, you
may want to look for a word that is not embedded — you want to find the word
"cat"
but not the
"cat"
in
"cattle"
or in
"Popacatapetl"
, for example. The previous example worked for the string you were
searching but would not produce the right result if the word you were looking for was followed by a comma
or appeared at the end of the text. However, you have other options for specifying the pattern. You can use
a number of special sequences in a regular expression when you want to match a particular boundary. For
example, those presented in
Table 15-7
are especially useful:
Specifies the beginning of a line. For example, to find the word Java at the beginning of any line you
could use the expression
"^Java"
.
Specifies the end of a line. For example, to find the word Java at the end of any line you could use the
expression
"Java$"
. Of course, if you were expecting a period at the end of a line the expression would
be
"Java\\.$"
.
$
Specifies a word boundary. To find three-letter words beginning with
'h'
and ending with
'd'
, you
could use the expression
"\\bh.d\\b"
.
\b
A non-word boundary — the complement of
\b
.
\B
Specifies the beginning of the string being searched. To find the word
The
at the very beginning of the
string being searched, you could use the expression
"\\AThe\\b"
. The
\\b
at the end of the regular ex-
pression is necessary to avoid finding
Then
or
There
at the beginning of the input.
\A
Specifies the end of the string being searched. To find the word
hedge
followed by a period at the end of
a string, you could use the expression “
\\bhedge\\.\\z
".
\z
The end of input except for the final terminator. A final terminator is a newline character (
'\n'
) if
Pat-
tern.UNIX_LINES
is set. Otherwise, it can also be a carriage return (
'\r'
), a carriage return followed by
a newline character, a next-line character (
'\u0085'
), a line separator (
'\u2028'
), or a paragraph separ-
ator (
'\u2029'
).
\Z
Although you have moved quite a way from the simple search for a fixed substring offered by the
String
class methods, you still can't search for sequences that may vary in length. If you wanted to find all the
numerical values in a string, which might be sequences such as
1234
or
23.45
or
999.998
, for example,
you don't yet have the ability to do that. You can fix that now by taking a look at
quantifiers
in a regular
expression and what they can do for you.
Using Quantifiers
A quantifier following a subsequence of a pattern determines the possibilities for how that subsequence of a
pattern can repeat. Let's take an example. Suppose you want to find any numerical values in a string. If you
take the simplest case, we can say an integer is an arbitrary sequence of one or more digits. The quantifier
for one or more is the meta-character
"+"
. You have also seen that you can use
\d
as shorthand for any digit
(remembering, of course, that it becomes
\\d
in a Java
String
literal), so you could express any sequence
of digits as the regular expression:
"\\d+"