Database Reference
In-Depth Information
ludicrous = True neg : pos = 12.6 : 1.0
uninvolving = True neg : pos = 12.3 : 1.0
astounding = True pos : neg = 11.7 : 1.0
avoids = True pos : neg = 11.7 : 1.0
fascination = True pos : neg = 11.0 : 1.0
animators = True pos : neg = 10.3 : 1.0
symbol = True pos : neg = 10.3 : 1.0
Confusion matrix:
Predicted class
----------------------------------------
| 195 (TP) | 5 (FN) | Actual class
----------------------------------------
| 101 (FP) | 99 (TN) |
----------------------------------------
As discussed earlier in Chapter 7, a confusion matrix is a specific table layout
that allows visualization of the performance of a model over the testing set. Every
row and column corresponds to a possible class in the dataset. Each cell in the
matrix shows the number of test examples for which the actual class is the row and
the predicted class is the column. Good results correspond to large numbers down
the main diagonal (TP and TN) and small, ideally zero, off-diagonal elements (FP
and FN). Table 9.7 shows the confusion matrix from the previous program output
for the testing set of 400 reviews. Because a well-performed classifier should have a
confusion matrix with large numbers for TP and TN and ideally near zero numbers
for FP and FN, it can be concluded that the naïve Bayes classifier has many false
negatives, and it does not perform very well on this testing set.
Table 9.7 Confusion Matrix for the Example Testing Set
Predicted Class
Positive Negative
Actual Class Positive 195 (TP) 5 (FN)
Negative
101 (FP) 99 (TN)
Chapter 7 has introduced a few measures to evaluate the performance of a classifier
beyond the confusion matrix. Precision and recall are two measures commonly
used to evaluate tasks related to text analysis. Definitions of precision and recall
are given in Equations 9.8 and 9.9 .
Search WWH ::




Custom Search