Java Reference
In-Depth Information
NLP tokenizer APIs
In this section, we will demonstrate several different tokenization techniques using the
OpenNLP, Stanford, and LingPipe APIs. Although there are a number of other APIs avail-
able, we restricted the demonstration to these APIs. The examples will give you an idea of
what techniques are available.
We will use a string called paragraph to illustrate these techniques. The string includes a
new line break that may occur in real text in unexpected places. It is defined here:
private String paragraph = "Let's pause, \nand then ++
"reflect.";
Search WWH ::




Custom Search