Java Reference
In-Depth Information
Finding word dependencies using the GrammaticalStructure
class
Another approach to parse text is to use the LexicalizedParser object created in the
previous section in conjunction with the TreebankLanguagePack interface. A Tree-
bank is a text corpus that has been annotated with syntactic or semantic information,
providing information about a sentence's structure. The first major Treebank was the Penn
TreeBank ( http://www.cis.upenn.edu/~treebank/ ) . Treebanks can be created manually or
semiautomatically.
The next example illustrates how a simple string can be formatted using the parser. A
tokenizer factory creates a tokenizer.
The CoreLabel class that we discussed in the Using the LexicalizedParser class section
is used here:
String sentence = "The cow jumped over the moon.";
TokenizerFactory<CoreLabel> tokenizerFactory =
PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
Tokenizer<CoreLabel> tokenizer =
tokenizerFactory.getTokenizer(new
StringReader(sentence));
List<CoreLabel> wordList = tokenizer.tokenize();
parseTree = lexicalizedParser.apply(wordList);
The TreebankLanguagePack interface specifies methods for working with a Tree-
bank. In the following code, a series of objects are created that culminate with the creation
of a TypedDependency instance, which is used to obtain dependency information
about elements of a sentence. An instance of a GrammaticalStructureFactory
object is created and used to create an instance of a GrammaticalStructure class.
As this class' name implies, it stores grammatical information between elements in the
tree:
TreebankLanguagePack tlp =
lexicalizedParser.treebankLanguagePack;
GrammaticalStructureFactory gsf =
tlp.grammaticalStructureFactory();
GrammaticalStructure gs =
Search WWH ::




Custom Search