Finding Sentences - Natural Language Processing with Java

Java Reference

In-Depth Information

Evaluating the model using the

SentenceDetectorEvaluator class

We reserved a part of the sample file for evaluation purposes so that we can use the Sen-

tenceDetectorEvaluator class to evaluate the model. We modified the sen-

tence.train file by extracting the last ten sentences and placing them in a file called

evalSample . Then we used this file to evaluate the model. In the next example, we've re-

used the lineStream and sampleStream variables to create a stream of Sen-

tenceSample objects based on the file's contents:

lineStream = new PlainTextByLineStream(new

FileReader("evalSample"));

sampleStream = new SentenceSampleStream(lineStream);

An instance of the SentenceDetectorEvaluator class is created using the previ-

ously created SentenceDetectorME class variable detector. The second argument of

the constructor is a SentenceDetectorEvaluationMonitor object, which we will

not use here. Then the evaluate method is called:

SentenceDetectorEvaluator sentenceDetectorEvaluator

= new SentenceDetectorEvaluator(detector, null);

sentenceDetectorEvaluator.evaluate(sampleStream);

The getFMeasure method will return an instance of the FMeasure class that provides

measurements of the quality of the model:

System.out.println(sentenceDetectorEvaluator.getFMeasure());

The output follows. Precision is the fraction of correct instances that are included, and re-

call reflects the sensitivity of the model. F-measure is a score that combines recall and pre-

cision. In essence, it reflects how well the model works. It is best to keep the precision

above 90 percent for tokenization and SBD tasks:

Precision: 0.8181818181818182

Recall: 0.9

F-Measure: 0.8571428571428572

Search WWH ::

Custom Search

Home