Strings, Characters and Regular Expressions - Java: How to Program Early Objects

Java Reference

In-Depth Information

• Character method charValue (p. 622) returns the char stored in a Character object. Character

method toString returns a String representation of a Character .

Section 14.6 Tokenizing String s

• Class String 's split method (p. 623) tokenizes a String based on the delimiter (p. 623) speci-

fied as an argument and returns an array of String s containing the tokens (p. 623).

Section 14.7 Regular Expressions, Class Pattern and Class Matcher

• Regular expressions (p. 624) are sequences of characters and symbols that define a set of strings.

They're useful for validating input and ensuring that data is in a particular format.

• String method matches (p. 624) receives a string that specifies the regular expression and match-

es the contents of the String object on which it's called to the regular expression. The method

returns a boolean indicating whether the match succeeded.

• A character class is an escape sequence that represents a group of characters. Each character class

matches a single character in the string we're attempting to match with the regular expression.

• A word character ( \w; p. 624) is any letter (uppercase or lowercase), any digit or the underscore

character.

• A white-space character ( \s ) is a space, a tab, a carriage return, a newline or a form feed.

• A digit ( \d ) is any numeric character.

• To match a set of characters that does not have a predefined character class (p. 624), use square

brackets, [] . Ranges can be represented by placing a dash ( - ) between two characters. If the first

character in the brackets is "^" , the expression accepts any character other than those indicated.

• When the regular expression operator "*" appears in a regular expression, the program attempts

to match zero or more occurrences of the subexpression immediately preceding the "*" .

• Operator "+" attempts to match one or more occurrences of the subexpression preceding it.

• The character "|" allows a match of the expression to its left or to its right.

• Parentheses () are used to group parts of the regular expression.

•The asterisk ( * ) and plus ( + ) are formally called quantifiers (p. 628).

• A quantifier affects only the subexpression immediately preceding it.

• Quantifier question mark ( ? ) matches zero or one occurrences of the expression that it quantifies.

• A set of braces containing one number ( { n } ) matches exactly n occurrences of the expression it

quantifies. Including a comma after the number enclosed in braces matches at least n occurrences.

• A set of braces containing two numbers ( { n , m } ) matches between n and m occurrences of the

expression that it qualifies.

• Quantifiers are greedy (p. 629)—they'll match as many occurrences as they can as long as the

match is successful. If a quantifier is followed by a question mark ( ? ), the quantifier becomes re-

luctant (p. 629), matching as few occurrences as possible as long as the match is successful.

• String method replaceAll (p. 629) replaces text in a string with new text (the second argument)

wherever the original string matches a regular expression (the first argument).

• Escaping a special regular-expression character with a \ instructs the regular-expression matching

engine to find the actual character, as opposed to what it represents in a regular expression.

• String method replaceFirst (p. 629) replaces the first occurrence of a pattern match and re-

turns a new string in which the appropriate characters have been replaced.

• String method split (p. 629) divides a string into substrings at any location that matches a spec-

ified regular expression and returns an array of the substrings.

Java: How to Program Early Objects

Search WWH ::

Custom Search

Home