Finding Sentences - Natural Language Processing with Java

Java Reference

In-Depth Information

[is]

[done]

...

[who]

[knows]

[.]

Using the StanfordCoreNLP class

The StanfordCoreNLP class supports sentence detection using the ssplit annotator.

In the following example, the tokenize and ssplit annotators are used. A pipeline

object is created and the annotate method is applied against the pipeline using the

paragraph as its argument:

Properties properties = new Properties();

properties.put("annotators", "tokenize, ssplit");

StanfordCoreNLP pipeline = new StanfordCoreNLP(properties);

Annotation annotation = new Annotation(paragraph);

pipeline.annotate(annotation);

The output contains a lot of information. Only the output for the first line is shown here:

Sentence #1 (13 tokens):

When determining the end of sentences we need to consider

several factors.

[Text=When CharacterOffsetBegin=0 CharacterOffsetEnd=4]

[Text=determining CharacterOffsetBegin=5

CharacterOffsetEnd=16] [Text=the CharacterOffsetBegin=17

CharacterOffsetEnd=20] [Text=end CharacterOffsetBegin=21

CharacterOffsetEnd=24] [Text=of CharacterOffsetBegin=25

CharacterOffsetEnd=27] [Text=sentences

CharacterOffsetBegin=28 CharacterOffsetEnd=37] [Text=we

CharacterOffsetBegin=38 CharacterOffsetEnd=40] [Text=need

CharacterOffsetBegin=41 CharacterOffsetEnd=45] [Text=to

CharacterOffsetBegin=46 CharacterOffsetEnd=48]

[Text=consider CharacterOffsetBegin=49

CharacterOffsetEnd=57] [Text=several

CharacterOffsetBegin=58 CharacterOffsetEnd=65]

[Text=factors CharacterOffsetBegin=66

Search WWH ::

Custom Search

Home