Finding Parts of Text - Natural Language Processing with Java

Java Reference

In-Depth Information

List<HasWord> sentence = it.next();

for (HasWord token : sentence) {

System.out.println(token);

}

When executed, we get the following output:

Let

's

pause

,

and

then

reflect

.

Using a pipeline

Here, we will use the StanfordCoreNLP class as demonstrated in Chapter 1 , Introduc-

tion to NLP . However, we use a simpler annotator string to tokenize the paragraph. As

shown next, a Properties object is created and assigned the annotators tokenize

and ssplit .

The tokenize annotator specifies that tokenization will occur and the ssplit annota-

tion results in sentences being split:

Properties properties = new Properties();

properties.put("annotators", "tokenize, ssplit");

The StanfordCoreNLP class and the Annotation classes are created next:

StanfordCoreNLP pipeline = new StanfordCoreNLP(properties);

Annotation annotation = new Annotation(paragraph);

The annotate method is executed to tokenize the text and then the prettyPrint

method will display the tokens:

pipeline.annotate(annotation);

pipeline.prettyPrint(annotation, System.out);

Search WWH ::

Custom Search

Home