Database Reference
In-Depth Information
7.3 Diagnostics of Classifiers
So far, this topic has talked about three classifiers: logistic regression, decision trees,
and naïve Bayes. These three methods can be used to classify instances into distinct
groups according to the similar characteristics they share. Each of these classifiers
faces the same issue: how to evaluate if they perform well.
A few tools have been designed to evaluate the performance of a classifier. Such
tools are not limited to the three classifiers in this topic but rather serve the purpose
of assessing classifiers in general.
A confusion matrix is a specific table layout that allows visualization of the
performance of a classifier.
Table 7.6 shows the confusion matrix for a two-class classifier. True positives
(TP) are the number of positive instances the classifier correctly identified as
positive. False positives (FP) are the number of instances in which the classifier
identified as positive but in reality are negative. True negatives (TN) are the
number of negative instances the classifier correctly identified as negative. False
negatives (FN) are the number of instances classified as negative but in reality are
positive. In a two-class classification, a preset threshold may be used to separate
positives from negatives. TP and TN are the correct guesses. A good classifier should
have large TP and TN and small (ideally zero) numbers for FP and FN.
Table 7.6 Confusion Matrix
Predicted Class
Positive Negative
Actual Class Positive
True Positives (TP) False Negatives (FN)
Negative
False Positives (FP) True Negatives (TN)
In the bank marketing example, the training set includes 2,000 instances. An
additional 100 instances are included as the testing set. Table 7.7 shows the
confusion matrix of a naïve Bayes classifier on 100 clients to predict whether they
would subscribe to the term deposit. Of the 11 clients who subscribed to the term
deposit, the model predicted 3 subscribed and 8 not subscribed. Similarly, of the
89 clients who did not subscribe to the term, the model predicted 2 subscribed
and 87 not subscribed. All correct guesses are located from top left to bottom right
of the table. It's easy to visually inspect the table for errors, because they will be
represented by any nonzero values outside the diagonal.
Search WWH ::




Custom Search