ASSESSMENT METRICS FOR IMBALANCED LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

curves, in order to compare two classifiers, both the class distribution and mis-

classification costs must be known or estimated. They argue that misclassification

costs are often difficult to estimate. B-ROC allows them to bypass the issue of

misclassification cost estimation altogether.

8.5 CONCLUSION

The purpose of this chapter was to discuss the issue of classifier evaluation

in the case of class-imbalanced datasets. The discussion focused on evaluation

metrics or on graphical methods commonly used or more recently proposed and

less known. In particular, it looked at the following well-known single-class focus

metrics belonging to the threshold category of metrics: sensitivity and specificity,

precision and recall, and their combinations: the G-mean and the F -measure,

respectively. It then considered a multi-class focus threshold metric that combines

the partial accuracies of both classes, the macro-average accuracy. In the threshold

metric category, still, other new and more experimental combination metrics were

surveyed: mean-class-weighted accuracy, optimized precision, the AGms, and the

IBA. The chapter then discussed the following methods and metrics belonging

to the ranking category of methods and metrics: ROC analysis, cost curves, PR

curves, and AUC. Still in the ranking category, the chapter discussed newer

and more experimental methods and metrics: the H -measure, Area under the

Precision-Recall Curve and Integrated Area under the ROC Curve, and B-ROC

analysis.

Mainly because there has been very little discussion on other aspects of clas-

sifier evaluation in the context of class imbalances than evaluation metrics and

graphical methods, this chapter did not delve into the issues of error estimation

and statistical tests, which are important aspects of the evaluation process as

well. We will now say a few words about these issues that should warrant much

greater attention in the future.

In the case of error estimation, the main method used in the machine learning

community is 10-fold cross-validation. As mentioned in the introduction, it is

clear that when such a method is used, because of the random partitioning of

the data into 10 subsets, the results may be erroneous because some of the

subsets may not contain many or even any instances of the minority class if the

dataset is extremely imbalanced or very small. This is well documented, and it

is common, in the case of class imbalances in particular, to use stratified 10-

fold cross-validation, which ensures that the proportion of positive to negative

examples found in the original distribution is respected in all the folds (See [1]

for a description of this approach). Going beyond this simple issue, however, both

Japkowicz and Shah [1] and Raeder et al. [2] discuss the issue of error estimation

in the general case. They both conclude that while 10-fold cross-validation is

quite reliable in most cases, more research is necessary in order to establish ideal

re-sampling strategies at all times. Raeder et al. [2] specifically mention the class

imbalance situation where they suggest that additional experiments are needed

Search WWH ::

Custom Search

Home