FOUNDATIONS OF IMBALANCED LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

FOUNDATIONS OF IMBALANCED

LEARNING

GARY M. WEISS

Department of Computer and Information Sciences, Fordham, University, Bronx,

NY, USA

Abstract: Many important learning problems, from a wide variety of domains,

involve learning from imbalanced data. Because this learning task is quite chal-

lenging, there has been a tremendous amount of research on this topic over the past

15 years. However, much of this research has focused on methods for dealing with

imbalanced data, without discussing exactly how or why such methods work — or

what underlying issues they address. This is a significant oversight, which this

chapter helps to address. This chapter begins by describing what is meant by imbal-

anced data, and by showing the effects of such data on learning. It then describes

the fundamental learning issues that arise when learning from imbalanced data,

and categorizes these issues as problem-definition-level issues, data-level issues,

or algorithm-level issues. The chapter then describes the methods for addressing

these issues and organizes these methods using the same three categories. As one

example, the data-level issue of “absolute rarity” (i.e., not having sufficient numbers

of minority class examples to properly learn the decision boundaries for the minor-

ity class) can best be addressed using a data-level method that acquires additional

minority class training examples. But as we shall see in this chapter, sometimes

such a direct solution is not available, and less direct methods must be utilized.

Common misconceptions are also discussed and explained. Overall, this chapter

provides an understanding of the foundations of imbalanced learning by providing

a clear description of the relevant issues, and a clear mapping of these issues to

the methods that can be used to address them.

Search WWH ::

Custom Search

Home