Detecting Part of Speech - Natural Language Processing with Java

Java Reference

In-Depth Information

Using Stanford POS taggers

In this section, we will examine two different approaches supported by the Stanford API to

perform tagging. The first technique uses the MaxentTagger class. As its name implies,

it uses maximum entropy to find the POS. We will also use this class to demonstrate a

model designed to handle textese-type text. The second approach will use the pipeline ap-

proach with annotators. The English taggers use the Penn Treebank English POS tag set.

Using Stanford MaxentTagger

The MaxentTagger class uses a model to perform the tagging task. There are a number

of models that come bundled with the API, all with the file extension .tagger . They in-

clude English, Chinese, Arabic, French, and German models. The English models are listed

here. The prefix, wsj , refers to models based on the Wall Street Journal. The other terms

refer to techniques used to train the model. These concepts are not covered here:

• wsj-0-18-bidirectional-distsim.tagger

• wsj-0-18-bidirectional-nodistsim.tagger

• wsj-0-18-caseless-left3words-distsim.tagger

• wsj-0-18-left3words-distsim.tagger

• wsj-0-18-left3words-nodistsim.tagger

• english-bidirectional-distsim.tagger

• english-caseless-left3words-distsim.tagger

• english-left3words-distsim.tagger

The example reads in a series of sentences from a file. Each sentence is then processed and

various ways of accessing and displaying the words and tags are illustrated.

We start with a try-with-resources block to deal with IO exceptions as shown here. The

wsj-0-18-bidirectional-distsim.tagger file is used to create an instance of

the MaxentTagger class.

A List instance of List instances of HasWord objects is created using the Max-

entTagger class' tokenizeText method. The sentences are read in from the file

sentences.txt .The HasWord interface represents words and contains two methods: a

setWord and a word method. The latter method returns a word as a string. Each sentence

is represented by a List instance of HasWord objects:

Search WWH ::

Custom Search

Home