CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

Acq

Corn

Crude

98

90

96

80

94

92

90

88

80

70

60

0

2000

4000

6000

1000

3000

5000

7000

0

2000

4000

6000

100

95

Earn

Grain

Interest

99

90

85

80

75

80

98

70

97

60

96

0

2000

4000

6000

0

2000

4000

6000

0

2000

4000

6000

90

Ship

Trade

Money-fx

90

85

80

70

75

70

60

70

65

60

50

0

2000

4000

6000

1000

3000

5000

7000

0

2000

4000

6000

Wheat

90

SMOTE

85

Active learning

V IRUTAL

80

75

0

2000

4000

6000

Figure 6.9 Comparison of SMOTE, AL, and VIRTUAL on 10 largest categories of

Reuters-21578 . We show the g -means (%) ( y -axis) of the current model for the test

set versus the number of training samples ( x -axis) seen.

Table 6.4 presents the g -means and the total learning time for SMOTE, AL,

and VIRTUAL. Classical batch SVM's g -means values are also provided as a

reference point. In Reuters datasets, VIRTUAL yields the highest g -means in all

categories. Table 6.4 shows the effectiveness of adaptive virtual instance gen-

eration. In categories corn , interest ,and ship with high class imbalance ratio,

VIRTUAL gains substantial improvement in g -means. Compared to AL, VIRTUAL

requires additional time for the creation of virtual instances and selection of those

that may become support vectors. Despite this overhead, VIRTUAL's training times

are comparable with those of AL. In the cases where minority examples are abun-

dant, SMOTE demands substantially longer time to create virtual instances than

VIRTUAL. But as the rightmost columns in Table 6.3 show, only a small fraction

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home