NONSTATIONARY STREAM DATA LEARNING WITH IMBALANCED CLASS DISTRIBUTION - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

Table 7.2 AUROC Values for Selected Data Chunks of ELEC Dataset

Data Chunk

Normal

SMOTE

UB

SERA

REA

10

0 . 6763

0 . 6608

0 . 7273

0 . 7428

0 . 8152

20

0 . 5605

0 . 6715

0 . 6954

0 . 7308

0 . 6429

30

0 . 6814

0 . 7007

0 . 5654

0 . 6339

0 . 8789

40

0 . 7102

0 . 6284

0 . 6297

0 . 7516

0 . 9462

Bold values represents the highest AUROC values measured at running algorithms in com-

parison against 10th, 20th, 30th, 40th data chunk of SEA and ELEC datasets, respectively.

(Fig. 7.9c), and 40 (Fig. 7.9d). Table 7.2 gives the numerical values for AUROC

of all comparative algorithms on selected data chunks. The data collected validate

that OA is not a decisive metric for imbalanced learning, as many algorithms

here can outperform the baseline. REA provides the best AUROC result over

other algorithms after data chunk 25, followed by SERA.

7.4.2.3 SHP Dataset As proposed in [32], the spinning hyperplane (SHP)

dataset defines the class boundary as a hyperplane in n dimensions by coefficients

α 1 ,α 2 ,...,α n . An example x = (x 1 ,x 2 ,...,x n ) is created by randomizing each

feature in the range [0 , 1], that is, x i ∈ [0 , 1] ,i = 1 ,...,n . A constant bias is

defined as

n

1

2

α 0

=

α i

(7.34)

i =

1

Then, the class label y of the example x is determined by

⎨

⎩

n

1

α i x i ≥ α 0

i = 1

y

=

(7.35)

n

0

α i x i <α 0

i = 1

In contrast to the abrupt concept drifts in SEA dataset, the SHP dataset

embraces a gradual concept drift scheme in that the class concepts undergo

a “shift” whenever a new example is created. Specifically, part of the coeffi-

cients α 1 ,...,α n will be randomly sampled to have a small increment added

whenever a new example has been created, which is defined as

t

N

= s ×

(7.36)

where t is the magnitude of change for every N example, and s alternates in

[ − 1 , 1], specifying the direction of change and has a 20% chance of being

reversed for every N example. α 0 is also modified thereafter using Equation 7.34.

In this way, the class boundary would be similar to an SHP in the process of

creating data. A dataset with gradual concept drifts requires the learning algorithm

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home