Classification: Basic Concepts - Data Mining: Concepts and Techniques - page 365

Databases Reference

In-Depth Information

Measure

Formula

TP C TN

P C N

accuracy, recognition rate

FP C FN

P C N

error rate, misclassification rate

sensitivity, true positive rate,

TP

P

recall

TN

N

specificity, true negative rate

TP

TP C FP

precision

F , F 1 , F -score,

2 precision recall

precision C recall

harmonic mean of precision and recall

2

.1C

/ precision recall

F , where

is a non-negative real number

2 precision C recall

Figure 8.13 Evaluation measures. Note that some measures are known by more than one name.

TP , TN , FP , P , N refer to the number of true positive, true negative, false positive, positive,

and negative samples, respectively (see text).

buys computer D no . Suppose we use our classifier on a test set of labeled tuples. P is the

number of positive tuples and N is the number of negative tuples. For each tuple, we

compare the classifier's class label prediction with the tuple's known class label.

There are four additional terms we need to know that are the “building blocks” used

in computing many evaluation measures. Understanding them will make it easy to grasp

the meaning of the various measures.

True positives

: These refer to the positive tuples that were correctly labeled by

the classifier. Let TP be the number of true positives.

True negatives

.

TP

/

: These are the negative tuples that were correctly labeled by the

classifier. Let TN be the number of true negatives.

False positives

.

TN

/

: These are the negative tuples that were incorrectly labeled as

positive (e.g., tuples of class buys computer D no for which the classifier predicted

buys computer D yes ). Let FP be the number of false positives.

False negatives

.

FP

/

: These are the positive tuples that were mislabeled as neg-

ative (e.g., tuples of class buys computer D yes for which the classifier predicted

buys computer D no ). Let FN be the number of false negatives.

.

FN

/

These terms are summarized in the confusion matrix of Figure 8.14.

The confusion matrix is a useful tool for analyzing how well your classifier can

recognize tuples of different classes. TP and TN tell us when the classifier is getting

things right, while FP and FN tell us when the classifier is getting things wrong (i.e.,

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home