Information Technology Reference
In-Depth Information
Abalone
Breast cancer
100
100
98
99
96
98
94
97
92
96
95 0
90
1000
2000
3000
0
100
200
300
400
Letter
Satimage
85
100
99
80
98
97
75
96
70 0
95 0
5000
10,000
15,000
1000
2000
3000
4000
SMOTE
Active learning
V IRTUAL
Figure 6.8 Comparison of SMOTE, AL, and VIRTUAL on UCI datasets. We present the
g -means (%) ( y -axis) of the current model for the test set versus the number of training
samples ( x -axis) seen.
However, SMOTE converges to higher g -means than AL in some of the cate-
gories, indicating that the virtual positive examples provide additional information
that can be used to improve the model. VIRTUAL converges to the same or
even higher g -means than SMOTE while generating fewer virtual instances.
For the UCI datasets (Fig. 6.8), VIRTUAL performs as well as AL in abalone
in g -means and consistently outperforms AL and SMOTE in the other three
datasets.
In Table 6.4, the support vector imbalance ratios of all the three methods are
lower than the data imbalance ratio, and VIRTUAL achieves the most balanced
ratios of positive and negative support vectors in the Reuters datasets. Despite that
the datasets used have different data distributions, the portion of virtual instances
become support vectors in VIRTUAL consistently and significantly higher than
that in SMOTE. These results confirm the previous discussion that VIRTUAL is
more effective in generating informative virtual instances.
Search WWH ::




Custom Search