ASSESSMENT METRICS FOR IMBALANCED LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

example with the highest score is assigned the rank 1. Then, we can calculate

the AUC as:

| T p |

i = 1 (R i − i)

| T p || T n |

AUC (f ) =

where T p ⊂ T and T n ⊂ T are, respectively, the subsets of positive and negative

examples in test set T ,and R i is the rank of the i th example in T p given by

classifier f .

AUC basically measures the probability of the classifier assigning a higher

rank to a randomly chosen positive example than a randomly chosen negative

example. Even though the AUC attempts to be a summary statistic, just as other

single metric performance measures, it too loses significant information about the

behavior of the learning algorithm over the entire operating range (for instance,

it misses information on concavities in the performance, or trade-off behaviors

between the TP and FP performances).

It can be argued that the AUC is a good way to get a score for the general

performance of a classifier and to compare it to that of another classifier. This

is particularly true in the case of imbalanced data where, as discussed earlier,

accuracy is too strongly biased toward the dominant class. However, some criti-

cisms have also appeared warning against the use of AUC across classifiers for

comparative purposes. One of the most obvious ones is that if the ROC curves

of the two classifiers intersect (such as in the case of Figure 8.2), then the AUC-

based comparison between the classifiers can be relatively uninformative and

even misleading. However, a possibly more serious limitation of the AUC for

comparative purposes lies in the fact that the misclassification cost distributions

(and hence the skew-ratio distributions) used by the AUC are different for dif-

ferent classifiers. This is discussed in the next subsection, which generally looks

at newer and more experimental ranking metrics and graphical methods.

8.4.5 Newer Ranking Metrics and Graphical Methods

8.4.5.1 The H -measure The more serious criticism of AUC just mentioned

means that, when comparing different classifiers using the AUC, one may in

fact be comparing oranges and apples, as the AUC may give more weight to

misclassifying a point by classifier A than it does by classifier B. This is because

the AUC uses an implicit weight function that varies from classifier to classifier.

This criticism was made by Hand [16], who also proposed the H -measure to

remedy this problem. The H -measure allows the user to select a cost-weight

function that is equal for all the classifiers under comparison and thus allows for

fairer comparisons. The formulation of the H -measure is a little involved and

will not be discussed here. The reader is referred to [16] for further details about

the H -measure as well as a pointer to R code implementing it.

It is worth noting, however, that Hand [16]'s criticism was recently challenged

by Flach et al. [17], who found that the criticism may only hold when the AUC is

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home