CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

CLASS IMBALANCE AND ACTIVE

LEARNING

JOSH ATTENBERG

Etsy, Brooklyn, NY, USA and NYU Stern School of Business, New York, NY USA

¸ EYDA ERTEKIN

MIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge,

MA, USA

Abstract: The performance of a predictive model is tightly coupled with the

data used during training. While using more examples in the training will often

result in a better informed, more accurate model; limits on computer memory and

real-world costs associated with gathering labeled examples often constrain the

amount of data that can be used for training. In settings where the number of

training examples is limited, it often becomes meaningful to carefully see just which

examples are selected. In active learning (AL), the model itself plays a hands-on role

in the selection of examples for labeling from a large pool of unlabeled examples.

These examples are used for model training. Numerous studies have demonstrated,

both empirically and theoretically, the benefits of AL: Given a fixed budget, a

training system that interactively involves the current model in selecting the training

examples can often result in a far greater accuracy than a system that simply

selects random training examples. Imbalanced settings provide special opportunities

and challenges for AL. For example, while AL can be used to build models that

counteract the harmful effects of learning under class imbalance, extreme class

imbalance can cause an AL strategy to “fail,” preventing the selection scheme from

choosing any useful examples for labeling. This chapter focuses on the interaction

between AL and class imbalance, discussing (i) AL techniques designed specifically

for dealing with imbalanced settings, (ii) strategies that leverage AL to overcome

the deleterious effects of class imbalance, (iii) how extreme class imbalance can

prevent AL systems from selecting useful examples, and alternatives to AL in these

cases.

Search WWH ::

Custom Search

Home