Java Reference
In-Depth Information
A TokenizerModel class hides the model and is used to instantiate the tokenizer. The
model must have been previously trained. In the next example, the tokenizer is instanti-
ated using the model found in the en-token.bin file. This model has been trained to
work with common English text.
The location of the model file is returned by the method getModelDir , which you will
need to implement. The returned value is dependent on where the models are stored on
your system. Many of these models can be found at http://opennlp.sourceforge.net/mod-
els-1.5/ .
After the instance of a FileInputStream class is created, the input stream is used as
the argument of the TokenizerModel constructor. The tokenize method will gener-
ate an array of strings. This is followed by code to display the tokens:
try {
InputStream modelInputStream = new FileInputStream(
new File(getModelDir(), "en-token.bin"));
TokenizerModel model = new
TokenizerModel(modelInputStream);
Tokenizer tokenizer = new TokenizerME(model);
String tokens[] = tokenizer.tokenize(paragraph);
for (String token : tokens) {
System.out.println(token);
}
} catch (IOException ex) {
// Handle the exception
}
The output is as follows:
Let
's
pause
,
and
then
reflect
.
Search WWH ::




Custom Search