Databases Reference
In-Depth Information
Predicted class
yes
no
Total
Actual class
yes
TP
FN
P
no
FP
TN
N
P 0 N 0 P C N
Total
Figure 8.14 Confusion matrix, shown with totals for positive and negative tuples.
Classes
buys computer D yes
buys computer D no
Total
Recognition (%)
buys computer D yes
6954
46
7000
99.34
buys computer D no
412
2588
3000
86.27
Total
7366
2634
10,000
95.42
Figure 8.15 Confusion matrix for the classes buys computer D yes and buys computer D no, where an
entry in row i and column j shows the number of tuples of class i that were labeled by the
classifier as class j . Ideally, the nondiagonal entries should be zero or close to zero.
mislabeling). Given m classes (where m 2), a confusion matrix is a table of at least
size m by m . An entry, CM i , j in the first m rows and m columns indicates the number
of tuples of class i that were labeled by the classifier as class j . For a classifier to have
good accuracy, ideally most of the tuples would be represented along the diagonal of the
confusion matrix, from entry CM 1,1 to entry CM m , m , with the rest of the entries being
zero or close to zero. That is, ideally, FP and FN are around zero.
The table may have additional rows or columns to provide totals. For example, in
the confusion matrix of Figure 8.14, P and N are shown. In addition, P 0 is the number
of tuples that were labeled as positive
and N 0 is the number of tuples that
.
TP C FP
/
were labeled as negative
. The total number of tuples is TP C TN C FP C TN ,
or P C N , or P 0 C N 0 . Note that although the confusion matrix shown is for a binary
classification problem, confusion matrices can be easily drawn for multiple classes in a
similar manner.
Now let's look at the evaluation measures, starting with accuracy. The accuracy of a
classifier on a given test set is the percentage of test set tuples that are correctly classified
by the classifier. That is,
.
TN C FN
/
TP C TN
P C N
.
(8.21)
accuracy D
In the pattern recognition literature, this is also referred to as the overall recognition
rate of the classifier, that is, it reflects how well the classifier recognizes tuples of the var-
ious classes. An example of a confusion matrix for the two classes buys computer D yes
(positive) and buys computer D no (negative) is given in Figure 8.15. Totals are shown,
 
Search WWH ::




Custom Search