Java Reference
In-Depth Information
\A
Specifies the beginning of the string being searched. To find the word The at the very
beginning of the string being searched you could use the expression " \\AThe\\b ". The
\\b at the end of the regular expression is necessary to avoid finding Then or There at
the beginning of the input.
\z
Specifies the end of the string being searched. To find the word hedge followed by a
period at the end of a string you could use the expression " \\bhedge\\.\\z ".
\Z
The end of input except for the final terminator. A final terminator will be a newline
character ( ' \n ' ) if Pattern.UNIX _ LINES is set. Otherwise it can also be a carriage
return ( '\r' ), a carriage return followed by a newline, a next-line character
( ' \u0085 ' ) , a line separator ( ' \u2028 ' ) , or a paragraph separator ( ' \u2029 ' ) .
While we have moved quite a way from the simple search for a fixed substring offered by the String
class methods, we still can't search for sequences that may vary in length. If you wanted to find all the
numerical values in a string, which might be sequences such as 1234 or 23.45 or 999.998 for
instance, we don't yet have the ability to do that. Let's fix that now by taking a look at quantifiers in a
regular expression, and what they can do for us.
Using Quantifiers
A quantifier following a subsequence of a pattern determines the possibilities for how that subsequence
of a pattern can repeat. Let's take an example. Suppose we want to find any numerical values in a string.
If we take the simplest case we can say an integer is an arbitrary sequence of one or more digits. The
quantifier for one or more is the meta-character + . We have also seen that we can use \d as shorthand
for any digit (remembering of course that it becomes \\d in a Java string literal), so we could express
any sequence of digits as the regular expression:
"\\d+"
Of course, a number may also include a decimal point and may be optionally followed by further digits.
To indicate something can occur just once or not at all, as is the case with a decimal point, we can use
the quantifier ? . We can write the pattern for a sequence of digits followed by a decimal point as:
"\\d+\\.?"
To add the possibility of further digits we can append \\d+ to what we have so far to produce
the expression:
"\\d+\\.?\\d+"
This is a bit untidy. We can rewrite this as an integral part followed by an optional fractional part by
putting parentheses around the bit for the fractional part and adding the ? operator:
"\\d+(\\.\\d+)?"
However, this isn't quite right. We can have 2. as a valid numerical value | for instance so we want to
specify zero or more appearances of digits in the fractional part. The * quantifier expresses that, so
maybe we should use:
"\\d+(\\.\\d*)?"
Search WWH ::




Custom Search