Java Reference
In-Depth Information
Using OpenNLP POS taggers
OpenNLP provides several classes in support of POS tagging. We will demonstrate how to
use the
POSTaggerME
class to perform basic tagging and the
ChunkerME
class to per-
form chunking. Chunking involves grouping related words according to their types. This
can provide additional insight into the structure of a sentence. We will also examine the
creation and use of a
POSDictionary
instance.
Using the OpenNLP POSTaggerME class for POS taggers
The OpenNLP
POSTaggerME
class uses maximum entropy to process the tags. The tag-
ger determines the type of tag based on the word itself and the word's context. Any given
word may have multiple tags associated with it. The tagger uses a probability model to de-
termine the specific tag to be assigned.
POS models are loaded from a file. The
en-pos-maxent.bin
model is used frequently
and is based on the Penn TreeBank tag set. Various pretrained POS models for OpenNLP
can be found at
http://opennlp.sourceforge.net/models-1.5/
.
We start with a try-catch block to handle any
IOException
that might be generated
when loading a model, as shown here.
We use the
en-pos-maxent.bin
file for the model:
try (InputStream modelIn = new FileInputStream(
new File(getModelDir(), "en-pos-maxent.bin"));) {
…
}
catch (IOException e) {
// Handle exceptions
}
Next, create the
POSModel
and
POSTaggerME
instances as shown here:
POSModel model = new POSModel(modelIn);
POSTaggerME tagger = new POSTaggerME(model);
The
tag
method can now be applied to the tagger using the text to be processed as its ar-
gument: