Java Reference
In-Depth Information
Using the Stanford API
There are several approaches to parsing available in the Stanford NLP API. First, we will
demonstrate a general purposes parser, the
LexicalizedParser
class. Then, we will
illustrate how the result of the parser can be displayed using the
TreePrint
class. This
will be followed by a demonstration of how to determine word dependencies using the
GrammaticalStructure
class.
Using the LexicalizedParser class
The
LexicalizedParser
class is a lexicalized PCFG parser. It can use various models
to perform the parsing process. The
apply
method is used with a
List
instance of the
CoreLabel
objects to create a parse tree.
In the following code sequence, the parser is instantiated using the
eng-
lishPCFG.ser.gz
model:
String parserModel = ".../models/lexparser/
englishPCFG.ser.gz";
LexicalizedParser lexicalizedParser =
LexicalizedParser.loadModel(parserModel);
The
list
instance of the
CoreLabel
objects is created using the
Sentence
class'
toCoreLabelList
method. The
CoreLabel
objects contain a word and other inform-
ation. There are no tags or labels for these words. The words in the array have been effect-
ively tokenized.
String[] senetenceArray = {"The", "cow", "jumped", "over",
"the", "moon", "."};
List<CoreLabel> words =
Sentence.toCoreLabelList(senetenceArray);
The
apply
method can now be invoked:
Tree parseTree = lexicalizedParser.apply(words);
One simple approach to display the result of the parse is to use the
pennPrint
method,
which displays the parse tree in the same way as the Penn TreeBank does