Java Reference
In-Depth Information
So far you have been trying to find the occurrence of a pattern anywhere in a string. In many situations you
will want to be more specific. You may want to look for a pattern that appears at the beginning of a line in a
string but not anywhere else, or maybe just at the end of any line. As you saw in the previous example, you
may want to look for a word that is not embedded — you want to find the word "cat" but not the "cat"
in "cattle" or in "Popacatapetl" , for example. The previous example worked for the string you were
searching but would not produce the right result if the word you were looking for was followed by a comma
or appeared at the end of the text. However, you have other options for specifying the pattern. You can use
a number of special sequences in a regular expression when you want to match a particular boundary. For
example, those presented in Table 15-7 are especially useful:
TABLE 15-7 : Boundary Matching in a Regular Expression
SEQUENCE BOUNDARY MATCHED
^
Specifies the beginning of a line. For example, to find the word Java at the beginning of any line you
could use the expression "^Java" .
Specifies the end of a line. For example, to find the word Java at the end of any line you could use the
expression "Java$" . Of course, if you were expecting a period at the end of a line the expression would
be "Java\\.$" .
$
Specifies a word boundary. To find three-letter words beginning with 'h' and ending with 'd' , you
could use the expression "\\bh.d\\b" .
\b
A non-word boundary — the complement of \b .
\B
Specifies the beginning of the string being searched. To find the word The at the very beginning of the
string being searched, you could use the expression "\\AThe\\b" . The \\b at the end of the regular ex-
pression is necessary to avoid finding Then or There at the beginning of the input.
\A
Specifies the end of the string being searched. To find the word hedge followed by a period at the end of
a string, you could use the expression “ \\bhedge\\.\\z ".
\z
The end of input except for the final terminator. A final terminator is a newline character ( '\n' ) if Pat-
tern.UNIX_LINES is set. Otherwise, it can also be a carriage return ( '\r' ), a carriage return followed by
a newline character, a next-line character ( '\u0085' ), a line separator ( '\u2028' ), or a paragraph separ-
ator ( '\u2029' ).
\Z
Although you have moved quite a way from the simple search for a fixed substring offered by the String
class methods, you still can't search for sequences that may vary in length. If you wanted to find all the
numerical values in a string, which might be sequences such as 1234 or 23.45 or 999.998 , for example,
you don't yet have the ability to do that. You can fix that now by taking a look at quantifiers in a regular
expression and what they can do for you.
Using Quantifiers
A quantifier following a subsequence of a pattern determines the possibilities for how that subsequence of a
pattern can repeat. Let's take an example. Suppose you want to find any numerical values in a string. If you
take the simplest case, we can say an integer is an arbitrary sequence of one or more digits. The quantifier
for one or more is the meta-character "+" . You have also seen that you can use \d as shorthand for any digit
(remembering, of course, that it becomes \\d in a Java String literal), so you could express any sequence
of digits as the regular expression:
"\\d+"
 
 
Search WWH ::




Custom Search