Java Reference
In-Depth Information
Using the Stanford API
There are several approaches to parsing available in the Stanford NLP API. First, we will
demonstrate a general purposes parser, the LexicalizedParser class. Then, we will
illustrate how the result of the parser can be displayed using the TreePrint class. This
will be followed by a demonstration of how to determine word dependencies using the
GrammaticalStructure class.
Using the LexicalizedParser class
The LexicalizedParser class is a lexicalized PCFG parser. It can use various models
to perform the parsing process. The apply method is used with a List instance of the
CoreLabel objects to create a parse tree.
In the following code sequence, the parser is instantiated using the eng-
lishPCFG.ser.gz model:
String parserModel = ".../models/lexparser/
englishPCFG.ser.gz";
LexicalizedParser lexicalizedParser =
LexicalizedParser.loadModel(parserModel);
The list instance of the CoreLabel objects is created using the Sentence class'
toCoreLabelList method. The CoreLabel objects contain a word and other inform-
ation. There are no tags or labels for these words. The words in the array have been effect-
ively tokenized.
String[] senetenceArray = {"The", "cow", "jumped", "over",
"the", "moon", "."};
List<CoreLabel> words =
Sentence.toCoreLabelList(senetenceArray);
The apply method can now be invoked:
Tree parseTree = lexicalizedParser.apply(words);
One simple approach to display the result of the parse is to use the pennPrint method,
which displays the parse tree in the same way as the Penn TreeBank does
( http://www.sfs.uni-tuebingen.de/~dm/07/autumn/795.10/ptb-annotation-guide/root.html ) :
Search WWH ::




Custom Search