Java Reference
In-Depth Information
are represented by placing a dash ( - ) between two characters. In the example, "[A-Z]"
matches a single uppercase letter. If the first character in the brackets is "^" , the expression
accepts any character other than those indicated. However, "[^Z]" is not the same as "[A-
Y]" , which matches uppercase letters A-Y— "[^Z]" matches any character other than cap-
ital Z, including lowercase letters and nonletters such as the newline character. Ranges in
character classes are determined by the letters' integer values. In this example, "[A-Za-z]"
matches all uppercase and lowercase letters. The range "[A-z]" matches all letters and also
matches those characters (such as [and \) with an integer value between uppercase Z and
lowercase a (for more information on integer values of characters see Appendix B). Like
predefined character classes, character classes delimited by square brackets match a single
character in the search object.
In line 9, the asterisk after the second character class indicates that any number of let-
ters can be matched. In general, when the regular-expression operator "*" appears in a reg-
ular expression, the application attempts to match zero or more occurrences of the
subexpression immediately preceding the "*" . Operator "+" attempts to match one or
more occurrences of the subexpression immediately preceding "+" . So both "A*" and "A+"
will match "AAA" or "A" , but only "A*" will match an empty string.
If method validateFirstName returns true (line 29 of Fig. 14.21), the application
attempts to validate the last name (line 31) by calling validateLastName (lines 13-16 of
Fig. 14.20). The regular expression to validate the last name matches any number of letters
split by spaces, apostrophes or hyphens.
Line 33 of Fig. 14.21 calls method validateAddress (lines 19-23 of Fig. 14.20) to
validate the address. The first character class matches any digit one or more times ( \\d+ ).
Two \ characters are used, because \ normally starts an escape sequence in a string. So \\d
in a String represents the regular-expression pattern \d . Then we match one or more
white-space characters ( \\s+ ). The character "|" matches the expression to its left or to its
right. For example, " Hi (John|Jane)" matches both " Hi John" and " Hi Jane" . The paren-
theses are used to group parts of the regular expression. In this example, the left side of |
matches a single word, and the right side matches two words separated by any amount of
white space. So the address must contain a number followed by one or two words. There-
fore, "10 Broadway" and "10 Main Street" are both valid addresses in this example. The
city (lines 26-29 of Fig. 14.20) and state (lines 32-35 of Fig. 14.20) methods also match
any word of at least one character or, alternatively, any two words of at least one character if
the words are separated by a single space, so both Waltham and West Newton would match.
Quantifiers
The asterisk ( * ) and plus ( + ) are formally called quantifiers . Figure 14.22 lists all the quan-
tifiers. We've already discussed how the asterisk ( * ) and plus ( + ) quantifiers work. All
quantifiers affect only the subexpression immediately preceding the quantifier. Quantifier
question mark ( ? ) matches zero or one occurrences of the expression that it quantifies. A
set of braces containing one number ( { n } ) matches exactly n occurrences of the expression
it quantifies. We demonstrate this quantifier to validate the zip code in Fig. 14.20 at line
40. Including a comma after the number enclosed in braces matches at least n occurrences
of the quantified expression. The set of braces containing two numbers ( { n , m } ), matches
between n and m occurrences of the expression that it qualifies. Quantifiers may be applied
to patterns enclosed in parentheses to create more complex regular expressions.
 
Search WWH ::




Custom Search