Information Technology Reference
In-Depth Information
1
Normal
0.8
SERA
REA
SMOTE
0.6
UB
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
(d) ROC curves of algorithms in comparison
for the 40th data chunk of SEA dataset
Figure 7.7
( Continued )
Table 7.1 AUROC Values for Selected Data Chunks of SEA Dataset
Data Chunk
Normal
SMOTE
UB
SERA
REA
10
0 . 9600
0 . 9749
0 . 9681
0 . 9637
1 . 0000
20
0 . 9349
0 . 9397
0 . 9276
0 . 9373
0 . 9966
30
0 . 9702
0 . 9602
0 . 9565
0 . 9415
0 . 9964
40
0 . 9154
0 . 9770
0 . 9051
0 . 9497
1 . 0000
Bold values represents the highest AUROC values measured at running algorithms in com-
parison against 10th, 20th, 30th, 40th data chunk of SEA and ELEC datasets, respectively.
With the order of the examples unchanged, the extracted dataset is evenly
sliced into 40 data chunks. Inside each data chunk, examples that represent elec-
tricity price going down are determined as the majority class data, while the
others that represent electricity price going up are randomly under-sampled as
the minority class data. The imbalanced ratio is set to be 0 . 05, which means
only 5% of the examples inside each data chunk belong to the minority class. To
summarize the preparation of this dataset, 80% of the majority class data and the
minority class data inside each data chunk are randomly sampled and merged as
the training data, and the remaining are used to assess the performance of the
corresponding trained hypotheses.
The results of the simulation are based on 10 random runs, where the ran-
domness comes from the random under-sampling of the minority class data. As
in the procedure for SEA datasets, observation points are set up in data chunks
5, 10, 15, 20, 25, 30, 35, and 40, respectively.
Figure 7.8a plots the averaged OA of the comparative algorithms. One might
be misled into believing that baseline is the best algorithm because it gives a
better OA rate than the algorithms after data chunk 25. This could be easily
proved wrong. Assume that a dumb method is introduced, which just classifies
Search WWH ::




Custom Search