Finding Sentences - Natural Language Processing with Java

Java Reference

In-Depth Information

Using the Trained model

We can then use the model as illustrated in the next code sequence. This is based on the

techniques illustrated in Using the SentenceDetectorME class earlier in this chapter:

try (InputStream is = new FileInputStream(

new File(getModelDir(), "modelFile"))) {

SentenceModel model = new SentenceModel(is);

SentenceDetectorME detector = new

SentenceDetectorME(model);

String sentences[] = detector.sentDetect(paragraph);

for (String sentence : sentences) {

System.out.println(sentence);

}

} catch (FileNotFoundException ex) {

// Handle exception

} catch (IOException ex) {

// Handle exception

}

The output is as follows:

When determining the end of sentences we need to consider

several factors.

Sentences may end with exclamation marks! Or possibly

questions marks?

Within sentences we may find numbers like 3.14159,

abbreviations such as found in Mr.

Smith, and possibly ellipses either within a sentence …, or

at the end of a sentence…

This model did not process the last sentence very well, which reflects a mismatch between

the sample text and the text the model is used against. Using relevant training data is im-

portant. Otherwise, downstream tasks based on this output will suffer.

Search WWH ::

Custom Search

Home