Information Technology Reference
In-Depth Information
Crude (Reuters)
Grain (Reuters)
USPS
100
100
100
90
80
90
80
98
70
60
70
60
96
50
40
30
94
50
40
92
0
2000
4000
6000
0
2000
4000
6000
0
2000
4000
6000
MNIST-8
OS (CiteSeer)
COMM (CiteSeer)
100
100
95
90
98
96
90
80
85
70
94
80
60
92
75
70
90
50
0
20,000
40,000
60,000
0
2000
4000
6000
0
2000
4000
6000
Letter-A (UCI)
Abalone-7 (UCI)
100
100
98
96
95
90
94
92
90
85
80
0
5000
10,000
15,000
0
1000
2000
3000
US SMOTE DC RS AL
Figure 6.6 Comparisons of g -means. The right border of the shaded area corresponds
to the early stopping point.
addressing the class imbalance problem, the results of batch algorithm with the
original training set are provided to form a baseline. LASVM is run in RS mode
for US, SMOTE, and DC.
We present the comparisons of the methods for g -means performance metric
for several datasets in Figure 6.6. The right border of the shaded light gray area
is the place where the aforementioned early stopping strategy is applied. The
curves in the graphs are averages of 10 runs. For completeness, all AL experi-
ments were allowed to continue to select examples until exhaustion, bypassing
any early stopping. Table 6.2 presents the PRBEP of the methods and the total
running times of the SMOTE and AL on 18 benchmark and real-world datasets.
The results for AL in Table 6.2 depict the results in the early stopping points.
The results for the other methods in Table 6.2 depict the values at the end of the
curves—when trained with the entire dataset—as those methods do not employ
any early stopping criteria. We did not apply early stopping criteria to the other
methods because, as observed from Figure 6.6, no early stopping criteria would
achieve a comparable training time to that of AL's training time without a signif-
icant loss in their prediction performance based on convergence time. The other
Search WWH ::




Custom Search