Introduction to NLP - Natural Language Processing with Java

Java Reference

In-Depth Information

Detecting Parts of Speech

Another way of classifying the parts of text is at the sentence level. A sentence can be de-

composed into individual words or combinations of words according to categories, such as

nouns, verbs, adverbs, and prepositions. Most of us learned how to do this in school. We

also learned not to end a sentence with a preposition contrary to what we did in the second

sentence of this paragraph.

Detecting the Parts of Speech ( POS ) is useful in other tasks such as extracting relation-

ships and determining the meaning of text. Determining these relationships is called Pars-

ing . POS processing is useful for enhancing the quality of data sent to other elements of a

pipeline.

The internals of a POS process can be complex. Fortunately, most of the complexity is hid-

den from us and encapsulated in classes and methods. We will use a couple of OpenNLP

classes to illustrate this process. We will need a model to detect the POS. The POSModel

class will be used and instanced using the model found in the en-pos-maxent.bin

file, as shown here:

POSModel model = new POSModelLoader().load(

new File("../OpenNLP Models/" "en-pos-maxent.bin"));

The POSTaggerME class is used to perform the actual tagging. Create an instance of this

class based on the previous model as shown here:

POSTaggerME tagger = new POSTaggerME(model);

Next, declare a string containing the text to be processed:

String sentence = "POS processing is useful for enhancing

the "

+ "quality of data sent to other elements of a pipeline.";

Here, we will use a whitespace tokenizer to tokenize the text:

String tokens[] =

WhitespaceTokenizer.INSTANCE.tokenize(sentence);

Search WWH ::

Custom Search

Home