Java Reference
In-Depth Information
Using OpenNLP chunking
The process of chunking involves breaking a sentence into parts or chunks. These chunks
can then be annotated with tags. We will use the ChunkerME class to illustrate how this
is accomplished. This class uses a model loaded into a ChunkerModel instance. The
ChunkerME class' chunk method performs the actual chunking process. We will also
examine the use of the chunkAsSpans method to return information about the span of
these chunks. This allows us to see how long a chunk is and what elements make up the
chunk.
We will use the en-pos-maxent.bin file to create a model for the POSTaggerME
instance. We need to use this instance to tag the text as we did in the Using OpenNLP
POSTaggerME class for POS taggers section earlier in this chapter. We will also use the
en-chunker.bin file to create a ChunkerModel instance to be used with the
ChunkerME instance.
These models are created using input streams, as shown in the following example.
We use a try-with-resources block to open and close files and to deal with any exceptions
that may be thrown:
try (
InputStream posModelStream = new FileInputStream(
getModelDir() + "\\en-pos-maxent.bin");
InputStream chunkerStream = new FileInputStream(
getModelDir() + "\\en-chunker.bin");) {
} catch (IOException ex) {
// Handle exceptions
}
The following code sequence creates and uses a tagger to find the POS of the sentence.
The sentence and its tags are then displayed:
POSModel model = new POSModel(posModelStream);
POSTaggerME tagger = new POSTaggerME(model);
String tags[] = tagger.tag(sentence);
for(int i=0; i<tags.length; i++) {
Search WWH ::




Custom Search