Java Reference
In-Depth Information
are represented by placing a dash (
-
) between two characters. In the example,
"[A-Z]"
matches a single uppercase letter. If the first character in the brackets is
"^"
, the expression
accepts any character other than those indicated. However,
"[^Z]"
is not the same as
"[A-
Y]"
, which matches uppercase letters A-Y—
"[^Z]"
matches
any character other than
cap-
ital Z, including lowercase letters and nonletters such as the newline character. Ranges in
character classes are determined by the letters' integer values. In this example,
"[A-Za-z]"
matches all uppercase and lowercase letters. The range
"[A-z]"
matches all letters and also
matches those characters (such as [and \) with an integer value between uppercase Z and
lowercase a (for more information on integer values of characters see Appendix B). Like
predefined character classes, character classes delimited by square brackets match a single
character in the search object.
In line 9, the asterisk after the second character class indicates that any number of let-
ters can be matched. In general, when the regular-expression operator
"*"
appears in a reg-
ular expression, the application attempts to match zero or more occurrences of the
subexpression immediately preceding the
"*"
. Operator
"+"
attempts to match one or
more occurrences of the subexpression immediately preceding
"+"
. So both
"A*"
and
"A+"
will match
"AAA"
or
"A"
, but only
"A*"
will match an empty string.
If method
validateFirstName
returns
true
(line 29 of Fig. 14.21), the application
attempts to validate the last name (line 31) by calling
validateLastName
(lines 13-16 of
Fig. 14.20). The regular expression to validate the last name matches any number of letters
split by spaces, apostrophes or hyphens.
Line 33 of Fig. 14.21 calls method
validateAddress
(lines 19-23 of Fig. 14.20) to
validate the address. The first character class matches any digit one or more times (
\\d+
).
Two
\
characters are used, because
\
normally starts an escape sequence in a string. So
\\d
in a
String
represents the regular-expression pattern
\d
. Then we match one or more
white-space characters (
\\s+
). The character
"|"
matches the expression to its left or to its
right. For example, "
Hi (John|Jane)"
matches both "
Hi
John"
and "
Hi
Jane"
. The paren-
theses are used to group parts of the regular expression. In this example, the left side of
|
matches a single word, and the right side matches two words separated by any amount of
white space. So the address must contain a number followed by one or two words. There-
fore,
"10
Broadway"
and
"10
Main
Street"
are both valid addresses in this example. The
city (lines 26-29 of Fig. 14.20) and state (lines 32-35 of Fig. 14.20) methods also match
any word of at least one character or, alternatively, any two words of at least one character if
the words are separated by a single space, so both
Waltham
and
West
Newton
would match.
Quantifiers
The asterisk (
*
) and plus (
+
) are formally called
quantifiers
. Figure 14.22 lists all the quan-
tifiers. We've already discussed how the asterisk (
*
) and plus (
+
) quantifiers work. All
quantifiers affect only the subexpression immediately preceding the quantifier. Quantifier
question mark (
?
) matches zero or one occurrences of the expression that it quantifies. A
set of braces containing one number (
{
n
}
) matches exactly
n
occurrences of the expression
it quantifies. We demonstrate this quantifier to validate the zip code in Fig. 14.20 at line
40. Including a comma after the number enclosed in braces matches at least
n
occurrences
of the quantified expression. The set of braces containing two numbers (
{
n
,
m
}
), matches
between
n
and
m
occurrences of the expression that it qualifies. Quantifiers may be applied
to patterns enclosed in parentheses to create more complex regular expressions.