Java Reference
In-Depth Information
BreakIterator
identifies the boundaries found in text using a series of methods and
fields. All of these return integer values, and they are detailed in the following table:
Method
Usage
first
Returns the first boundary of the text
next
Returns the boundary following the current boundary
previous
Returns the boundary preceding the current boundary
DONE
The final integer, which is assigned a value of -1 (indicating that there are no more boundaries to be found)
To use the iterator in a sequential fashion, the first boundary is identified using the
first
method, and then the
next
method is called repeatedly to find the subsequent boundaries.
The process is terminated when
Done
is returned. This technique is illustrated in the next
code sequence, which uses the previously declared
sentenceIterator
instance:
int boundary = sentenceIterator.first();
while (boundary != BreakIterator.DONE) {
int begin = boundary;
System.out.print(boundary + "-");
boundary = sentenceIterator.next();
int end = boundary;
if (end == BreakIterator.DONE) {
break;
}
System.out.println(boundary + " ["
+ paragraph.substring(begin, end) + "]");
}
On execution, we get the following output:
0-75 [When determining the end of sentences we need to
consider several factors. ]
75-117 [Sentences may end with exclamation marks! ]
117-146 [Or possibly questions marks? ]
146-233 [Within sentences we may find numbers like 3.14159