Java Reference
In-Depth Information
These will tokenize, split the text into sentences, and then find the POS tags:
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
To process the text, we will use the theSentence variable as input to Annotator .
The pipeline's annotate method is then invoked as shown here:
Annotation document = new Annotation(theSentence);
pipeline.annotate(document);
Since the pipeline can perform different types of processing, a list of CoreMap objects is
used to access the words and tags. The Annotation class' get method returns the list
of sentences, as shown here.
List<CoreMap> sentences =
document.get(SentencesAnnotation.class);
The contents of the CoreMap objects can be accessed using its get method. The meth-
od's argument is the class for the information needed. As shown in the following code ex-
ample, tokens are accessed using the TextAnnotation class, and the POS tags can be
retrieved using the PartOfSpeechAnnotation class. Each word of each sentence
and its tags is displayed:
for (CoreMap sentence : sentences) {
for (CoreLabel token :
sentence.get(TokensAnnotation.class)) {
String word = token.get(TextAnnotation.class);
String pos =
token.get(PartOfSpeechAnnotation.class);
System.out.print(word + "/" + pos + " ");
}
System.out.println();
}
The output will be as follows:
Search WWH ::




Custom Search