Detecting Part of Speech - Natural Language Processing with Java

Java Reference

In-Depth Information

Using OpenNLP chunking

The process of chunking involves breaking a sentence into parts or chunks. These chunks

can then be annotated with tags. We will use the ChunkerME class to illustrate how this

is accomplished. This class uses a model loaded into a ChunkerModel instance. The

ChunkerME class' chunk method performs the actual chunking process. We will also

examine the use of the chunkAsSpans method to return information about the span of

these chunks. This allows us to see how long a chunk is and what elements make up the

chunk.

We will use the en-pos-maxent.bin file to create a model for the POSTaggerME

instance. We need to use this instance to tag the text as we did in the Using OpenNLP

POSTaggerME class for POS taggers section earlier in this chapter. We will also use the

en-chunker.bin file to create a ChunkerModel instance to be used with the

ChunkerME instance.

These models are created using input streams, as shown in the following example.

We use a try-with-resources block to open and close files and to deal with any exceptions

that may be thrown:

try (

InputStream posModelStream = new FileInputStream(

getModelDir() + "\\en-pos-maxent.bin");

InputStream chunkerStream = new FileInputStream(

getModelDir() + "\\en-chunker.bin");) {

…

} catch (IOException ex) {

// Handle exceptions

}

The following code sequence creates and uses a tagger to find the POS of the sentence.

The sentence and its tags are then displayed:

POSModel model = new POSModel(posModelStream);

POSTaggerME tagger = new POSTaggerME(model);

String tags[] = tagger.tag(sentence);

for(int i=0; i<tags.length; i++) {

Search WWH ::

Custom Search

Home