Java Reference
In-Depth Information
BreakIterator identifies the boundaries found in text using a series of methods and
fields. All of these return integer values, and they are detailed in the following table:
Method
Usage
first
Returns the first boundary of the text
next
Returns the boundary following the current boundary
previous Returns the boundary preceding the current boundary
DONE
The final integer, which is assigned a value of -1 (indicating that there are no more boundaries to be found)
To use the iterator in a sequential fashion, the first boundary is identified using the first
method, and then the next method is called repeatedly to find the subsequent boundaries.
The process is terminated when Done is returned. This technique is illustrated in the next
code sequence, which uses the previously declared sentenceIterator instance:
int boundary = sentenceIterator.first();
while (boundary != BreakIterator.DONE) {
int begin = boundary;
System.out.print(boundary + "-");
boundary = sentenceIterator.next();
int end = boundary;
if (end == BreakIterator.DONE) {
break;
}
System.out.println(boundary + " ["
+ paragraph.substring(begin, end) + "]");
}
On execution, we get the following output:
0-75 [When determining the end of sentences we need to
consider several factors. ]
75-117 [Sentences may end with exclamation marks! ]
117-146 [Or possibly questions marks? ]
146-233 [Within sentences we may find numbers like 3.14159
Search WWH ::




Custom Search