ASSESSMENT METRICS FOR IMBALANCED LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

to be able to make the kind of statements they made in the general case, when

class imbalances are present.

With respect to the issue of statistical testing, we found a single paper—the

paper by Keller et al. [19]—that looked at the issue of class imbalances. In this

paper, the authors show that a particular test, the bootstrap percentile test for the

difference in F 1 measure between two classifiers, is conservative in the absence

of a skew, but becomes optimistically biased for close models in the presence of

a skew. This points out a potential problem with statistical tests in the case of

imbalanced data, but it is too succinct a study for general conclusions to be

drawn. Once again, much more research would be necessary on this issue to

make stronger statements.

We conclude this chapter by mentioning one aspect of classifier evaluation

that was not discussed so far: the issue of evaluation in imbalanced multi-class

classification problems. There have been a number of extensions of some of the

previously discussed methods for the case of multiple-class-imbalanced datasets.

On the ranking method front, these include multi-class ROC graphs, which gen-

erate as many ROC curves as there are classes [20]; multi-class AUC, which

computes the weighted average of all the AUCs produced by the Multi-class ROC

graph just mentioned; and a skew-sensitive version of this Multi-class AUC [21].

On the threshold-metrics front, two metrics have been used in the community:

misclassification costs and a multi-class extension of the G-mean (See references

to these works in [5] and a description in [3].)

8.6 ACKNOWLEDGMENTS

The author gratefully acknowledges the financial support from the Natural Sci-

ence and Engineering Council of Canada through the Discovery Grant Program.

REFERENCES

1. N. Japkowicz and M. Shah, Evaluating Learning Algorithms: A Classification Per-

spective . (New York, USA), Cambridge University Press, 2011.

2. T. Raeder, T. R. Hoens, and N. V. Chawla, “Consequences of variability in classifier

performance estimates,” in ICDM (Sydney, Australia), pp. 421-430, IEEE Computer

Society, 2010.

3. C. Ferri, J. Haernandez-Orallo, and R. Modroiu, “An experimental comparison of

performance measures for classification,” Pattern Recognition Letters , vol. 30, pp.

27-38, 2009.

4. R. Caruana and A. Niculescu-Mizil, “Data mining in metric space: An empirical

analysis of supervised learning performance criteria,” in Proceedings of the Tenth

ACM SIGKDD International Conference on Knowledge Discovery and Data mining ,

(Seattle, WA), pp. 69-78, 2004.

5. H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on

Knowledge and Data Engineering , vol. 21, no. 9, pp. 1263-1284, 2009.

Search WWH ::

Custom Search

Home