Java Reference
In-Depth Information
Table 14-6. Examples of Groups in Regular Expressions
Regular Expression: AB(XY)
Number of groups reported by Matcher class's groupCount() method: 1
Group Number
Group Text
0 AB(XY)
1 (XY)
Regular Expression: (AB)(XY)
Number of groups reported by Matcher class's groupCount() method: 2
Group Number
Group Text
0 (AB)(XY)
1 (AB)
2 (XY)
Regular Expression: ((A)((X)(Y)))
Number of groups reported by Matcher class's groupCount() method: 5
Group Number
Group Text
0
((A)((X)(Y)))
1
((A)((X)(Y)))
2
(A)
3
((X)(Y))
4
(X)
5
(Y)
Regular Expression: ABXY
Number of groups reported by Matcher class's groupCount() method: 0
Group Number
Group Text
0
ABXY
You can also back reference group numbers in a regular expression. Suppose you want to match text that starts with
"ab" followed by "xy" , which is followed by "ab" . You can write a regular expression as "abxyab" . You can also achieve
the same result by forming a group that contains "ab" and back referencing it as "(ab)xy\1" . Here, "\1" refers to group 1,
which is "(ab)" in this case. You can use "\2" to refer to group 2 , “\3” to refer to group 3, and so on. How will regular
expression "(ab)xy\12" be interpreted? You have used "\12" as the group back reference. The regular expression engine
is smart enough to detect that it contains only one group in "(ab)xy\12" . It uses "\1" as back reference to group 1, which
is "(ab)" and 2 as an ordinary character. Therefore, the regular expression "(ab)xy\12" is the same as "abxyab2" . If a
regular expression has 12 or more groups, \12 in the regular expression will refer to the twelfth group.
You can also fetch part of a matched text by using a group number in the regular expression. The group() method
in the Matcher class is overloaded. You have already seen the group() method, which takes no arguments. Another
version of the method takes a group number as an argument and returns the matched text by that group. Suppose you
have phone numbers embedded in the input text. All phone numbers occurs as a word and are ten digits long. The
first three digits are the area code. The regular expression \b\d{10}\b will match all phone numbers in the input text.
 
Search WWH ::




Custom Search