Java Reference
In-Depth Information
So far you have been trying to find the occurrence of a pattern anywhere in a string. In many situations you
will want to be more specific. You may want to look for a pattern that appears at the beginning of a line in a
string but not anywhere else, or maybe just at the end of any line. As you saw in the previous example, you
may want to look for a word that is not embedded — you want to find the word "cat" but not the "cat"
in "cattle" or in "Popacatapetl" , for example. The previous example worked for the string you were
searching but would not produce the right result if the word you were looking for was followed by a comma
or appeared at the end of the text. However, you have other options for specifying the pattern. You can use
a number of special sequences in a regular expression when you want to match a particular boundary. For
example, those presented in Table 15-7 are especially useful:
TABLE 15-7 : Boundary Matching in a Regular Expression
Specifies the beginning of a line. For example, to find the word Java at the beginning of any line you
could use the expression "^Java" .
Specifies the end of a line. For example, to find the word Java at the end of any line you could use the
expression "Java$" . Of course, if you were expecting a period at the end of a line the expression would
be "Java\\.$" .
Specifies a word boundary. To find three-letter words beginning with 'h' and ending with 'd' , you
could use the expression "\\bh.d\\b" .
A non-word boundary — the complement of \b .
Specifies the beginning of the string being searched. To find the word The at the very beginning of the
string being searched, you could use the expression "\\AThe\\b" . The \\b at the end of the regular ex-
pression is necessary to avoid finding Then or There at the beginning of the input.
Specifies the end of the string being searched. To find the word hedge followed by a period at the end of
a string, you could use the expression “ \\bhedge\\.\\z ".
The end of input except for the final terminator. A final terminator is a newline character ( '\n' ) if Pat-
tern.UNIX_LINES is set. Otherwise, it can also be a carriage return ( '\r' ), a carriage return followed by
a newline character, a next-line character ( '\u0085' ), a line separator ( '\u2028' ), or a paragraph separ-
ator ( '\u2029' ).
Although you have moved quite a way from the simple search for a fixed substring offered by the String
class methods, you still can't search for sequences that may vary in length. If you wanted to find all the
numerical values in a string, which might be sequences such as 1234 or 23.45 or 999.998 , for example,
you don't yet have the ability to do that. You can fix that now by taking a look at quantifiers in a regular
expression and what they can do for you.
Using Quantifiers
A quantifier following a subsequence of a pattern determines the possibilities for how that subsequence of a
pattern can repeat. Let's take an example. Suppose you want to find any numerical values in a string. If you
take the simplest case, we can say an integer is an arbitrary sequence of one or more digits. The quantifier
for one or more is the meta-character "+" . You have also seen that you can use \d as shorthand for any digit
(remembering, of course, that it becomes \\d in a Java String literal), so you could express any sequence
of digits as the regular expression:
Search WWH ::

Custom Search