Information Technology Reference
In-Depth Information
for the BioCreative I corpus was a precision of 0.65 and recall of 0.68. Overall
performance for the PMC corpus was a precision of 0.71 and recall of 0.62.
The performance of the system for the various equivalence classes was as
shown in Table 2.
Tabl e 2. Performance on two corpora for the predictable categories [6]
Prediction
BioCreative
TP FP FN P
R
1
12 57 17 0.17 0.41
2
0
1 38 0.0 0.0
4
556 278 512 0.67 0.52
5
284 251 72 0.53 0.80
PubMed Central
TP FP FN P
R
1
8 10 0 0.44 1.0
2
1
0
2 1.0 0.33
4
163 64 188 0.72 0.46
5
108 54 46 0.67 0.70
The predictions based on the test suites were almost entirely supported. The
single anomaly was the high recall observed on the PMC corpus for prediction 1,
where low recall was predicted. In all other cases, the predictions were correct—
recall for the equivalence class was predicted to be low for 1, 2, and 4 and it
was lower than the recall for the corpus as a whole for these equivalence classes;
recall was predicted to be high for 5, and it was higher than the recall for the
corpus as a whole for this equivalence class.
It will be noted that there are no results given for prediction 3. This is because
it concerns letter case, and letter case had been normalized to lower case in the
corpora. This points out again an advantage of test suites—we know that such
gene names exist in the literature, but they were not represented in these corpora
at all, making the corpora unsuitable for assessing the performance of a system
on this type of name.
It should be noted that these findings are significant (in the non-statistical
sense of that term) because of the small numbers of items in some of the cells,
not in spite of it. These details of performance would likely be lost in an evalu-
ation that only assessed precision, recall, and F-measure, and are the difference
between finding or missing elusive statements that are of crucial interest to the
biologist, perhaps precisely because of their rarity.
5 An Engineering Perspective on the Use of Test Suites
versus Corpora
To the extent that testing is considered in the natural language processing com-
munity, there is an implicit assumption that the way to test an application is
 
Search WWH ::




Custom Search