Information Technology Reference
In-Depth Information
Table 7.2 AUROC Values for Selected Data Chunks of ELEC Dataset
Data Chunk
Normal
SMOTE
UB
SERA
REA
10
0 . 6763
0 . 6608
0 . 7273
0 . 7428
0 . 8152
20
0 . 5605
0 . 6715
0 . 6954
0 . 7308
0 . 6429
30
0 . 6814
0 . 7007
0 . 5654
0 . 6339
0 . 8789
40
0 . 7102
0 . 6284
0 . 6297
0 . 7516
0 . 9462
Bold values represents the highest AUROC values measured at running algorithms in com-
parison against 10th, 20th, 30th, 40th data chunk of SEA and ELEC datasets, respectively.
(Fig. 7.9c), and 40 (Fig. 7.9d). Table 7.2 gives the numerical values for AUROC
of all comparative algorithms on selected data chunks. The data collected validate
that OA is not a decisive metric for imbalanced learning, as many algorithms
here can outperform the baseline. REA provides the best AUROC result over
other algorithms after data chunk 25, followed by SERA.
7.4.2.3 SHP Dataset As proposed in [32], the spinning hyperplane (SHP)
dataset defines the class boundary as a hyperplane in n dimensions by coefficients
α 1 2 ,...,α n . An example x = (x 1 ,x 2 ,...,x n ) is created by randomizing each
feature in the range [0 , 1], that is, x i [0 , 1] ,i = 1 ,...,n . A constant bias is
defined as
n
1
2
α 0
=
α i
(7.34)
i =
1
Then, the class label y of the example x is determined by
n
1
α i x i α 0
i = 1
y
=
(7.35)
n
0
α i x i 0
i = 1
In contrast to the abrupt concept drifts in SEA dataset, the SHP dataset
embraces a gradual concept drift scheme in that the class concepts undergo
a “shift” whenever a new example is created. Specifically, part of the coeffi-
cients α 1 ,...,α n will be randomly sampled to have a small increment added
whenever a new example has been created, which is defined as
t
N
= s ×
(7.36)
where t is the magnitude of change for every N example, and s alternates in
[ 1 , 1], specifying the direction of change and has a 20% chance of being
reversed for every N example. α 0 is also modified thereafter using Equation 7.34.
In this way, the class boundary would be similar to an SHP in the process of
creating data. A dataset with gradual concept drifts requires the learning algorithm
Search WWH ::




Custom Search