Java Reference
In-Depth Information
Detecting Parts of Speech
Another way of classifying the parts of text is at the sentence level. A sentence can be de-
composed into individual words or combinations of words according to categories, such as
nouns, verbs, adverbs, and prepositions. Most of us learned how to do this in school. We
also learned not to end a sentence with a preposition contrary to what we did in the second
sentence of this paragraph.
Detecting the Parts of Speech ( POS ) is useful in other tasks such as extracting relation-
ships and determining the meaning of text. Determining these relationships is called Pars-
ing . POS processing is useful for enhancing the quality of data sent to other elements of a
pipeline.
The internals of a POS process can be complex. Fortunately, most of the complexity is hid-
den from us and encapsulated in classes and methods. We will use a couple of OpenNLP
classes to illustrate this process. We will need a model to detect the POS. The POSModel
class will be used and instanced using the model found in the en-pos-maxent.bin
file, as shown here:
POSModel model = new POSModelLoader().load(
new File("../OpenNLP Models/" "en-pos-maxent.bin"));
The POSTaggerME class is used to perform the actual tagging. Create an instance of this
class based on the previous model as shown here:
POSTaggerME tagger = new POSTaggerME(model);
Next, declare a string containing the text to be processed:
String sentence = "POS processing is useful for enhancing
the "
+ "quality of data sent to other elements of a pipeline.";
Here, we will use a whitespace tokenizer to tokenize the text:
String tokens[] =
WhitespaceTokenizer.INSTANCE.tokenize(sentence);
Search WWH ::




Custom Search