Java Reference
In-Depth Information
The search and replace capability can be used to solve very simple problems. For example, if you want
to make sure that any sequence of one or more whitespace characters is replaced by a single space, you
can define the regular expression as " \\s + " and the replacement string as a single space " ". To
eliminate all spaces at the beginning of each line, you can use the expression " ^\\s+ " and define the
replacement string as empty, "".
Using Capturing Groups
Earlier we used the group() method for a Matcher object to retrieve the subsequence matched by the
entire pattern defined by the regular expression. The entire pattern represents what is called a capturing
group because the Matcher object captures the subsequence corresponding to the pattern match. Regular
expressions can also define other capturing groups that correspond to parts of the pattern. Each pair of
parentheses in a regular expression defines a separate capturing group in addition to the group that the whole
expression defines. In the earlier example, we defined the regular expression by the statement:
String regEx = "[+|-]?(\\d+(\\.\\d*)?)| (\\.\\d+)";
This defines three capturing groups other than the whole expression: one for the subexpression
(\\d+(\\.\\d*)?) , one for the subexpression (\\.\\d*) , and one for the subexpression
(\\.\\d+) . The Matcher object stores the subsequence that matches the pattern defined by each
capturing group, and what's more, you can retrieve them.
To retrieve the text matching a particular capturing group, you need a way to identify the capturing
group that you are interested in. To this end, capturing groups are numbered. The capturing group for
the whole regular expression is always number 0. Counting their opening parentheses from the left in
the regular expression numbers the other groups. Thus the first opening parenthesis from the left
corresponds to capturing group 1, the second opening parenthesis corresponds to capturing group 2,
and so on for as many opening parentheses as there are in the whole expression. The diagram below
illustrates how the groups are numbered in an arbitrary regular expression.
Group 0
( A ( B ) ( C ( D ) ) | ( E ) )
Group 4
Group 2
Group 5
Group 3
Group 1
Capturing Groups in an Arbitrary Expression
As you see, it is easy to number the capturing groups as long as you can count left parentheses. Group 1
is the same as Group 0 because the whole regular expression is parenthesized. The other capturing
groups in sequence are defined by (B) , (C(D)) , (D) , and (E) .
Search WWH ::




Custom Search