ENSEMBLE METHODS FOR CLASS IMBALANCE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

levels of imbalance, then trained classifiers from each of them by over- and

under-sampling, and ensemble them finally.

In [31], the over-sampling method in use is random over-sampling with

replacement. The under-sampling method in use is an informed sampling method,

which first removes redundant examples and then removes borderline examples

and examples suffering from the class label noise. Redundant examples are the

training examples whose role can be replaced by other training examples. They

are identified by the 1-NN rule. Borderline examples are the examples close to the

boundaries between different classes. They are unreliable because even a small

amount of attribute noise can cause the example to be misclassified. The bor-

derline examples and examples suffering from the class label noise are detected

by Tomek [33] links. Although threshold-moving is not as popular as sampling

methods, it is very important for CIL. It has been stated that trying other methods,

such as sampling, without trying by simply setting the threshold may be mislead-

ing [34]. The threshold-moving method uses the original training set to train an

NN and then moves the decision threshold such that the minority class examples

are easier to be predicted correctly. The three methods mentioned earlier are

used to train three classifiers that are able to handle imbalanced data, and then

hard ensemble and soft ensemble, two popular combination methods, are used to

combine them separately. Hard ensemble uses the crisp classification decisions

to vote, while soft ensemble uses the normalized real-value outputs to vote.

As shown in previous chapters, cost-sensitive learning methods can be used to

handle imbalanced data by assigning higher costs to the minority class examples,

so that they can be easily classified correctly. There are many cost-sensitive

ensemble methods, especially boosting-based methods. Some methods, such as

CBS1, CBS2 [35], and AsymBoost [2], modify the weight-distribution-updating

rule, so that the weights of expensive examples are higher. Some methods, such as

linear asymmetric classifier (LAC) [30], change the weights of the base learners

when forming the ensemble. Some methods, such as AdaC1, AdaC2, AdaC3 [36],

and AdaCost [37], not only change the weight-updating rule, but also change the

weights of base learners when forming ensemble, by associating the cost with

the weighted error rate of each class. Moreover, some methods directly minimize

a cost-sensitive loss function, such as Asymmetric Boosting [38].

For example, suppose that the cost of misclassifying a positive and a nega-

tive example is cost + and cost − , respectively. AsymBoost modifies the weight

distribution to

t (i) e − α t y i h t ( x i )

D

1 (i)

=

C

D

t

+

T √ K,

for positive examples

1 / T √ K,

C =

for negative examples

where K = cost + / cost −

is the cost ratio.

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home