Java Reference
In-Depth Information
while (boundary != BreakIterator.DONE) {
int begin = boundary;
System.out.print(boundary + "-");
boundary = wordIterator.next();
int end = boundary;
if(end == BreakIterator.DONE) break;
System.out.println(boundary + " ["
+ text.substring(begin, end) + "]");
}
The output follows where the brackets are used to clearly delineate the text:
0-5 [Let's]
5-6 [ ]
6-11 [pause]
11-12 [,]
12-13 [ ]
13-16 [and]
16-17 [ ]
17-21 [then]
21-22 [ ]
22-29 [reflect]
29-30 [.]
This technique does a fairly good job of identifying the basic tokens.
Search WWH ::




Custom Search