Detecting Part of Speech - Natural Language Processing with Java

Java Reference

In-Depth Information

The/DT voyage/NN of/IN the/DT Abraham/NNP Lincoln/NNP was/

VBD for/IN a/DT long/JJ time/NN marked/VBN by/IN no/DT

special/JJ incident/NN ./.

The pipeline can use additional options to control how the tagger works. For example, by

default the english-left3words-distsim.tagger tagger model is used. We can

specify a different model using the pos.model property, as shown here. There is also a

pos.maxlen property to control the maximum sentence size:

props.put("pos.model",

"C:/.../Models/english-caseless-left3words-distsim.tagger");

Sometimes it is useful to have a tagged document that is XML formatted. The Stan-

fordCoreNLP class' xmlPrint method will write out such a document. The method's

first argument is the annotator to be displayed. Its second argument is the Out-

putStream object to write to. In the following code sequence, the previous tagging res-

ults are written to standard output. It is enclosed in a try-catch block to handle IO excep-

tions:

try {

pipeline.xmlPrint(document, System.out);

} catch (IOException ex) {

// Handle exceptions

}

A partial listing of the results is as follows. Only the first two words and the last word are

displayed. Each token tag contains the word, its position, and its POS tag:

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet href="CoreNLP-to-HTML.xsl" type="text/

xsl"?>

<root>

Search WWH ::

Custom Search

Home