Java Reference
In-Depth Information
Now that we have a testing program and know how to use it, we need to focus on the regular
expression syntax that Java supports. Regular expression syntax is almost a language unto itself, so we'll
focus on the basics and some of the more commonly used advanced bits. The whole thing is worthy of a
book (and such topics exist).
Our simple test case uses a string literal. A string literal is just a piece of text. In the example we just
ran, "Sam" is a string literal. "Spade" is another string literal. If we replace "Sam" with "Spade," we get the
following output in the console:
Found a match for Spade beginning at 4 and ending at 9
We won't be able to accomplish much with just string literals. We can find all the instances of a
particular string, but we can't find anything that matches a pattern. To create a pattern, we have to dive
into the key component of regular expressions—metacharacters.
Metacharacters are characters that create patterns. Rather than represent a single literal character, a
metacharacter represents a set of characters. Some metacharacters work by themselves, while other
metacharacters are meaningless in the absence of other metacharacters. Table 15-1 describes the
metacharacters supported by the Java regular expression syntax.
Table 15-1. Java Regular Expression Metacharacters
Metacharacter
Description
(
Starts a subpattern (a pattern within the larger pattern). For example compan(y|ies)
lets you match either “company” or “companies”.
Also starts the definition of a group. (Dog) treats those three characters as a single
unit for other regular expression operators.
[
Starts a set of characters. For example, [A-Z] would match any upper-case
character. A[A-Z]Z would match “AAZ”, “ABZ”, and so on to “AZZ”.
{
Starts a match count specifier. For example, s{3} would match three s characters in
a row: sss . Pas{3} would match “Passs”.
\
Starts an escape sequence, so that you can match a literal instance of a
metacharacter. For example, if you needed to match the periods in a paragraph,
you'd use \. (that is, a backslash and a period). The period character (.) is itself a
regular expression metacharacter, so you must escape it to find the actual periods.
Similarly, to find an actual backslash character, you must escape the escape
character, thus: \\
^
Matches the start of the string. ^A finds any line that begins with “A”. ^[0-9] finds
any line that begins with a digit. ^[0-9]{2} finds any line that begins with two digits.
^[0-9]+ matches any line that begins with a number of any size.
Inside of a range, ^ is the negation character. [^abc] matches any character other
than a, b, or c. [^abc]at matches “rat” and “sat” and “eat” (and many others) but not
“bat” or “cat” (or “aat”).
-
Used within range expressions, such as [0-9] , which would match any digit.
Search WWH ::




Custom Search