Java Reference
In-Depth Information
•
Character
method
charValue
(p. 622)
returns the
char
stored in a
Character
object.
Character
method
toString
returns a
String
representation of a
Character
.
Section 14.6 Tokenizing
String
s
• Class
String
's
split
method (p. 623) tokenizes a
String
based on the delimiter (p. 623)
speci-
fied as an argument and returns an array of
String
s containing the tokens (p. 623).
Section 14.7 Regular Expressions, Class
Pattern
and Class
Matcher
• Regular expressions (p. 624) are sequences of characters and symbols that define a set of strings.
They're useful for validating input and ensuring that data is in a particular format.
•
String
method
matches
(p. 624) receives a string that specifies the regular expression and match-
es the contents of the
String
object on which it's called to the regular expression. The method
returns a
boolean
indicating whether the match succeeded.
• A character class is an escape sequence that represents a group of characters. Each character class
matches a single character in the string we're attempting to match with the regular expression.
• A word character (
\w;
p. 624) is any letter (uppercase or lowercase), any digit or the underscore
character.
• A white-space character (
\s
) is a space, a tab, a carriage return, a newline or a form feed.
• A digit (
\d
) is any numeric character.
• To match a set of characters that does not have a predefined character class (p. 624), use square
brackets,
[]
. Ranges can be represented by placing a dash (
-
) between two characters. If the first
character in the brackets is
"^"
, the expression accepts any character other than those indicated.
• When the regular expression operator
"*"
appears in a regular expression, the program attempts
to match zero or more occurrences of the subexpression immediately preceding the
"*"
.
• Operator
"+"
attempts to match one or more occurrences of the subexpression preceding it.
• The character
"|"
allows a match of the expression to its left or to its right.
• Parentheses () are used to group parts of the regular expression.
•The asterisk (
*
) and plus (
+
) are formally called quantifiers (p. 628).
• A quantifier affects only the subexpression immediately preceding it.
• Quantifier question mark (
?
) matches zero or one occurrences of the expression that it quantifies.
• A set of braces containing one number (
{
n
}
) matches exactly
n
occurrences of the expression it
quantifies. Including a comma after the number enclosed in braces matches at least
n
occurrences.
• A set of braces containing two numbers (
{
n
,
m
}
) matches between
n
and
m
occurrences of the
expression that it qualifies.
• Quantifiers are greedy (p. 629)—they'll match as many occurrences as they can as long as the
match is successful. If a quantifier is followed by a question mark (
?
), the quantifier becomes re-
luctant (p. 629), matching as few occurrences as possible as long as the match is successful.
•
String
method
replaceAll
(p. 629) replaces text in a string with new text (the second argument)
wherever the original string matches a regular expression (the first argument).
• Escaping a special regular-expression character with a
\
instructs the regular-expression matching
engine to find the actual character, as opposed to what it represents in a regular expression.
•
String
method
replaceFirst
(p. 629) replaces the first occurrence of a pattern match and re-
turns a new string in which the appropriate characters have been replaced.
•
String
method
split
(p. 629) divides a string into substrings at any location that matches a spec-
ified regular expression and returns an array of the substrings.