Information Technology Reference
In-Depth Information
even more when the bias just described was removed by adjusting the decision
thresholds within the classifier. Other research studies that investigate the use
of sampling to handle rare cases and class imbalance almost never remove this
bias — and worse yet, do not even discuss the implications of this decision. This
issue must be considered much more carefully in future studies.
2.7 RECOMMENDATIONS AND GUIDELINES
The authors of this chapter categorized some of the major issues with imbalanced
data and then described the methods most appropriate for handling each type of
issue. Thus, one recommendation is to try to use those methods for handling
imbalanced data that are most appropriate for dealing with the underlying issue.
This usually means utilizing methods at the same level as the issue, when pos-
sible. But often the ideal method is not feasible — like using active learning to
obtain more training data when there is an issue of absolute rarity. Thus, one
must often resort to sampling, but in such cases, one should be aware of the
drawbacks associated with these methods and avoid the common misconceptions
associated with these methods. Unfortunately, it is not easy to effectively deal
with imbalanced data because of the fundamental issues that are involved — which
is probably why even after more than a decade of intense scrutiny, the research
community still has much work remaining to come up with effective methods for
dealing with these problems. Even methods that had become accepted, such as
the use of AUC to generate robust classifiers when good evaluation metrics are
not available, are now coming into question [19]. Nonetheless, there has been
progress and certainly there is a much better appreciation of the problem than in
the past.
REFERENCES
1. P. Chan and S. Stolfo, “Toward scalable learning with non-uniform class and cost
distributions: A case study in credit card fraud detection,” in Proceedings of the Fourth
International Conference on Knowledge Discovery and Data Mining (New York, NY,
USA), pp. 164 - 168, AAAI Press, 2001.
2. G. Weiss and H. Hirsh, “Learning to predict rare events in event sequences,” in
Proceedings of the Fourth International Conference on Knowledge Discovery and
Data Mining (New York, NY, USA), pp. 359 - 363, AAAI Press, 1998.
3. T. Liao, “Classification of weld flaws with imbalanced data,” Expert Systems with
Applications: An International Journal , vol. 35, no. 3, pp. 1041 - 1052, 2008.
4. G. Weiss and F. Provost, “Learning when training data are costly: The effect of class
distribution on tree induction,” Journal of Artificial Intelligence Research , vol. 19,
pp. 315 - 354, 2003.
5. N. Japkowicz, “Concept learning in the presence of between-class and within-class
imbalances,” in Proceedings of the Fourteenth Conference of the Canadian Society for
Computational Studies of Intelligence (Ottawa, Canada), pp. 67 - 77, Springer-Verlag,
2001.
Search WWH ::




Custom Search