Java Reference
In-Depth Information
Using OpenNLP POS taggers
OpenNLP provides several classes in support of POS tagging. We will demonstrate how to
use the POSTaggerME class to perform basic tagging and the ChunkerME class to per-
form chunking. Chunking involves grouping related words according to their types. This
can provide additional insight into the structure of a sentence. We will also examine the
creation and use of a POSDictionary instance.
Using the OpenNLP POSTaggerME class for POS taggers
The OpenNLP POSTaggerME class uses maximum entropy to process the tags. The tag-
ger determines the type of tag based on the word itself and the word's context. Any given
word may have multiple tags associated with it. The tagger uses a probability model to de-
termine the specific tag to be assigned.
POS models are loaded from a file. The en-pos-maxent.bin model is used frequently
and is based on the Penn TreeBank tag set. Various pretrained POS models for OpenNLP
can be found at http://opennlp.sourceforge.net/models-1.5/ .
We start with a try-catch block to handle any IOException that might be generated
when loading a model, as shown here.
We use the en-pos-maxent.bin file for the model:
try (InputStream modelIn = new FileInputStream(
new File(getModelDir(), "en-pos-maxent.bin"));) {
}
catch (IOException e) {
// Handle exceptions
}
Next, create the POSModel and POSTaggerME instances as shown here:
POSModel model = new POSModel(modelIn);
POSTaggerME tagger = new POSTaggerME(model);
The tag method can now be applied to the tagger using the text to be processed as its ar-
gument:
Search WWH ::




Custom Search