Finding People and Things - Natural Language Processing with Java

Java Reference

In-Depth Information

Evaluating a model

The model can be evaluated using the TokenNameFinderEvaluator class. The eval-

uation process uses marked up sample text to perform the evaluation. For this simple ex-

ample, a file called en-ner-person.eval was created that contained the following

text:

<START:person> Bill <END> went to the farm to see

<START:person> Sally <END>.

Unable to find <START:person> Sally <END> he went to town.

There he saw <START:person> Fred <END> who had seen

<START:person> Sally <END> at the topic store with

<START:person> Mary <END>.

The following code is used to perform the evaluation. The previous model is used as the ar-

gument of the TokenNameFinderEvaluator constructor. A

NameSampleDataStream instance is created based on the evaluation file. The

TokenNameFinderEvaluator class' evaluate method performs the evaluation:

TokenNameFinderEvaluator evaluator =

new TokenNameFinderEvaluator(new

NameFinderME(model));

lineStream = new PlainTextByLineStream(

new FileInputStream("en-ner-person.eval"), "UTF-8");

sampleStream = new NameSampleDataStream(lineStream);

evaluator.evaluate(sampleStream);

To determine how well the model worked with the evaluation data, the getFMeasure

method is executed. The results are then displayed:

FMeasure result = evaluator.getFMeasure();

System.out.println(result.toString());

The following output displays the precision, recall, and F-measure. It indicates that 50 per-

cent of the entities found exactly match the evaluation data. The recall is the percentage of

entities defined in the corpus that were found in the same location. The performance meas-

ure is the harmonic mean and is defined as: F1 = 2 * Precision * Recall / (Recall + Preci-

sion)

Search WWH ::

Custom Search

Home