FOUNDATIONS OF IMBALANCED LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00

3

6

9

12

15

18

21

24

27

30

33

Disjunct size

Figure 2.4

Impact of disjunct size on classifier performance (move dataset).

2.2.3

Imbalanced Data for Unsupervised Learning Tasks

Virtually all work that focuses explicitly on imbalanced data focuses on imbal-

anced data for classification. While classification is a key supervised learning

task, imbalanced data can affect unsupervised learning tasks as well, such as

clustering and association rule mining. There has been very little work on the

effect of imbalanced data with respect to clustering, largely because it is difficult

to quantify “imbalance” in such cases (in many ways, this parallels the issues

with identifying rare cases). But certainly if there are meaningful clusters con-

taining relatively few examples, existing clustering methods will have trouble

identifying them. There has been more work in the area of association rule min-

ing, especially with regard to market basket analysis, which looks at how the

items purchased by a customer are related. Some groupings of items, such as

peanut butter and jelly , occur frequently and can be considered common cases.

Other associations may be extremely rare, but represent highly profitable sales.

For example, cooking pan and spatula will be an extremely rare association in

a supermarket, not because the items are unlikely to be purchased together, but

because neither item is frequently purchased in a supermarket [14]. Association

rule mining algorithms should ideally be able to identify such associations.

2.3 FOUNDATIONAL ISSUES

Now that we have established the necessary background and terminology, and

demonstrated some of the problems associated with class imbalance, we are

ready to identify and discuss the specific issues and problems associated with

learning from imbalanced data. These issues can be divided into three major

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home