Java Reference
In-Depth Information
13.3.4. Regions
A
Matcher
looks for matches in the character sequence that it is given
as input. By default, the entire character sequence is considered when
looking for a match. You can control the
region
of the character se-
quence to be used, through the method
region
which takes a starting
index and an ending index to define the subsequence in the input char-
acter sequence. The methods
regionStart
and
regionEnd
return, respect-
ively, the current start index and the current end index.
You can control whether a region is considered to be the true start and
end of the input, so that matching with the beginning or end of a line will
work, by invoking
useAnchoringBounds
with an argument of
true
(the de-
fault). If you don't want the region to match with the line anchors then
use
false
. The method
hasAnchoringBounds
will return the current setting.
Similarly, you can control whether the bounds of the region are trans-
parent to matching methods that want to look-ahead, look-behind, or
detect a boundary. By default bounds are opaquethat is, they will ap-
pear to be hard bounds on the input sequencebut you can change that
with
useTransparentBounds
. The
hasTransparentBounds
method returns the
current setting.
13.3.5. Efficiency
Suppose you want to parse a string into two parts that are separated
by a comma. The pattern
(.*),(.*)
is clear and straightforward, but it is
not necessarily the most efficient way to do this. The first
.*
will attempt
to consume the entire input. The matcher will have to then back up to
the last comma and then expand the rest into the second
.*
. You could
help this along by being clear that a comma is not part of the group:
([^,]*),([^,]*)
. Now it is clear that the matcher should only go so far
as the first comma and stop, which needs no backing up. On the other
hand, the second expression is somewhat less clear to the casual user
of regular expressions.