Detecting Part of Speech - Natural Language Processing with Java

Java Reference

In-Depth Information

Using OpenNLP POS taggers

OpenNLP provides several classes in support of POS tagging. We will demonstrate how to

use the POSTaggerME class to perform basic tagging and the ChunkerME class to per-

form chunking. Chunking involves grouping related words according to their types. This

can provide additional insight into the structure of a sentence. We will also examine the

creation and use of a POSDictionary instance.

Using the OpenNLP POSTaggerME class for POS taggers

The OpenNLP POSTaggerME class uses maximum entropy to process the tags. The tag-

ger determines the type of tag based on the word itself and the word's context. Any given

word may have multiple tags associated with it. The tagger uses a probability model to de-

termine the specific tag to be assigned.

POS models are loaded from a file. The en-pos-maxent.bin model is used frequently

and is based on the Penn TreeBank tag set. Various pretrained POS models for OpenNLP

can be found at http://opennlp.sourceforge.net/models-1.5/ .

We start with a try-catch block to handle any IOException that might be generated

when loading a model, as shown here.

We use the en-pos-maxent.bin file for the model:

try (InputStream modelIn = new FileInputStream(

new File(getModelDir(), "en-pos-maxent.bin"));) {

…

}

catch (IOException e) {

// Handle exceptions

}

Next, create the POSModel and POSTaggerME instances as shown here:

POSModel model = new POSModel(modelIn);

POSTaggerME tagger = new POSTaggerME(model);

The tag method can now be applied to the tagger using the text to be processed as its ar-

gument:

Search WWH ::

Custom Search

Home