Information Technology Reference
In-Depth Information
to be able to make the kind of statements they made in the general case, when
class imbalances are present.
With respect to the issue of statistical testing, we found a single paper—the
paper by Keller et al. [19]—that looked at the issue of class imbalances. In this
paper, the authors show that a particular test, the bootstrap percentile test for the
difference in F 1 measure between two classifiers, is conservative in the absence
of a skew, but becomes optimistically biased for close models in the presence of
a skew. This points out a potential problem with statistical tests in the case of
imbalanced data, but it is too succinct a study for general conclusions to be
drawn. Once again, much more research would be necessary on this issue to
make stronger statements.
We conclude this chapter by mentioning one aspect of classifier evaluation
that was not discussed so far: the issue of evaluation in imbalanced multi-class
classification problems. There have been a number of extensions of some of the
previously discussed methods for the case of multiple-class-imbalanced datasets.
On the ranking method front, these include multi-class ROC graphs, which gen-
erate as many ROC curves as there are classes [20]; multi-class AUC, which
computes the weighted average of all the AUCs produced by the Multi-class ROC
graph just mentioned; and a skew-sensitive version of this Multi-class AUC [21].
On the threshold-metrics front, two metrics have been used in the community:
misclassification costs and a multi-class extension of the G-mean (See references
to these works in [5] and a description in [3].)
8.6 ACKNOWLEDGMENTS
The author gratefully acknowledges the financial support from the Natural Sci-
ence and Engineering Council of Canada through the Discovery Grant Program.
REFERENCES
1. N. Japkowicz and M. Shah, Evaluating Learning Algorithms: A Classification Per-
spective . (New York, USA), Cambridge University Press, 2011.
2. T. Raeder, T. R. Hoens, and N. V. Chawla, “Consequences of variability in classifier
performance estimates,” in ICDM (Sydney, Australia), pp. 421-430, IEEE Computer
Society, 2010.
3. C. Ferri, J. Haernandez-Orallo, and R. Modroiu, “An experimental comparison of
performance measures for classification,” Pattern Recognition Letters , vol. 30, pp.
27-38, 2009.
4. R. Caruana and A. Niculescu-Mizil, “Data mining in metric space: An empirical
analysis of supervised learning performance criteria,” in Proceedings of the Tenth
ACM SIGKDD International Conference on Knowledge Discovery and Data mining ,
(Seattle, WA), pp. 69-78, 2004.
5. H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on
Knowledge and Data Engineering , vol. 21, no. 9, pp. 1263-1284, 2009.
Search WWH ::




Custom Search