Information Technology Reference
In-Depth Information
such as EasyEnsemble and BalanceCascade. It should be noted that a CIL method
such as under-sampling invoking ensemble learning algorithm to train a classifier
is totally different from ensemble methods designed for CIL because the invoked
ensemble method plays no role in handling imbalance data, and it is not a part
of a CIL method.
Some empirical results suggested:
1. Compared with standard ensemble methods, under-sampling and over-
sampling are not effective in general when invoking ensemble learning
algorithm such as AdaBoost to train a classifier. SMOTE sampling and
combination of different sampling methods are good choices in such cases.
2. Many of the ensemble methods for CIL are significantly better than
standard ensemble methods and sampling-based CIL methods, especially
Chan, EasyEnsemble, and BalanceCascade.
3. Stacking may be harmful when used to handle imbalanced data. It has a
high risk of over-fitting when the minority class examples are rare and
are used for multiple times.
Most of the ensemble methods for CIL deal with binary class problems. How
to use ensemble learning to help multiclass problems is an interesting direction.
Besides, although the goal of CIL is to achieve higher AUC value, F -measure,
or G -mean, most of the methods do not maximize them directly. Using ensemble
methods to tackle this problem is a promising direction.
REFERENCES
1. F.-J. Huang, Z.-H. Zhou, H.-J. Zhang, and T. Chen, “Pose invariant face recognition,”
in Proceedings of the 4th IEEE International Conference on Automatic Face and
Gesture Recognition , (Grenoble, France), pp. 245-250, 2000.
2. P. Viola and M. Jones, “Fast and robust classification using asymmetric AdaBoost
and a detector cascade,” in Advances in Neural Information Processing Systems 14
(T. G. Dietterich, S. Becker, and Z. Ghahramani, eds.), pp. 1311-1318, Cambridge,
MA: MIT Press, 2002.
3. P. Viola and M. Jones, “Robust real-time face detection,” International Journal of
Computer Vision , vol. 57, no. 2, pp. 137-154, 2004.
4. N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “SMOTEBoost: Improv-
ing prediction of the minority class in boosting,” in Proceedings of 7th European
Conference on Principles and Practice of Knowledge Discovery in Databases , (Cavtat-
Dubrovnik, Croatia), pp. 107-119, 2003.
5. P. K. Chan and S. J. Stolfo, “Toward scalable learning with non-uniform class and
cost distributions: A case study in credit card fraud detection,” in Proceedings of
the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining , (New York), pp. 164-168, 1998.
6. X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance
learning,” IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics ,
vol. 39, no. 2, pp. 539-550, 2009.
Search WWH ::




Custom Search