Information Technology Reference
In-Depth Information
FOUNDATIONS OF IMBALANCED
LEARNING
GARY M. WEISS
Department of Computer and Information Sciences, Fordham, University, Bronx,
NY, USA
Abstract: Many important learning problems, from a wide variety of domains,
involve learning from imbalanced data. Because this learning task is quite chal-
lenging, there has been a tremendous amount of research on this topic over the past
15 years. However, much of this research has focused on methods for dealing with
imbalanced data, without discussing exactly how or why such methods work — or
what underlying issues they address. This is a significant oversight, which this
chapter helps to address. This chapter begins by describing what is meant by imbal-
anced data, and by showing the effects of such data on learning. It then describes
the fundamental learning issues that arise when learning from imbalanced data,
and categorizes these issues as problem-definition-level issues, data-level issues,
or algorithm-level issues. The chapter then describes the methods for addressing
these issues and organizes these methods using the same three categories. As one
example, the data-level issue of “absolute rarity” (i.e., not having sufficient numbers
of minority class examples to properly learn the decision boundaries for the minor-
ity class) can best be addressed using a data-level method that acquires additional
minority class training examples. But as we shall see in this chapter, sometimes
such a direct solution is not available, and less direct methods must be utilized.
Common misconceptions are also discussed and explained. Overall, this chapter
provides an understanding of the foundations of imbalanced learning by providing
a clear description of the relevant issues, and a clear mapping of these issues to
the methods that can be used to address them.
Search WWH ::




Custom Search