Information Technology Reference
In-Depth Information
levels of imbalance, then trained classifiers from each of them by over- and
under-sampling, and ensemble them finally.
In [31], the over-sampling method in use is random over-sampling with
replacement. The under-sampling method in use is an informed sampling method,
which first removes redundant examples and then removes borderline examples
and examples suffering from the class label noise. Redundant examples are the
training examples whose role can be replaced by other training examples. They
are identified by the 1-NN rule. Borderline examples are the examples close to the
boundaries between different classes. They are unreliable because even a small
amount of attribute noise can cause the example to be misclassified. The bor-
derline examples and examples suffering from the class label noise are detected
by Tomek [33] links. Although threshold-moving is not as popular as sampling
methods, it is very important for CIL. It has been stated that trying other methods,
such as sampling, without trying by simply setting the threshold may be mislead-
ing [34]. The threshold-moving method uses the original training set to train an
NN and then moves the decision threshold such that the minority class examples
are easier to be predicted correctly. The three methods mentioned earlier are
used to train three classifiers that are able to handle imbalanced data, and then
hard ensemble and soft ensemble, two popular combination methods, are used to
combine them separately. Hard ensemble uses the crisp classification decisions
to vote, while soft ensemble uses the normalized real-value outputs to vote.
As shown in previous chapters, cost-sensitive learning methods can be used to
handle imbalanced data by assigning higher costs to the minority class examples,
so that they can be easily classified correctly. There are many cost-sensitive
ensemble methods, especially boosting-based methods. Some methods, such as
CBS1, CBS2 [35], and AsymBoost [2], modify the weight-distribution-updating
rule, so that the weights of expensive examples are higher. Some methods, such as
linear asymmetric classifier (LAC) [30], change the weights of the base learners
when forming the ensemble. Some methods, such as AdaC1, AdaC2, AdaC3 [36],
and AdaCost [37], not only change the weight-updating rule, but also change the
weights of base learners when forming ensemble, by associating the cost with
the weighted error rate of each class. Moreover, some methods directly minimize
a cost-sensitive loss function, such as Asymmetric Boosting [38].
For example, suppose that the cost of misclassifying a positive and a nega-
tive example is cost + and cost , respectively. AsymBoost modifies the weight
distribution to
t (i) e α t y i h t ( x i )
D
1 (i)
=
C
D
t
+
T K,
for positive examples
1 / T K,
C =
for negative examples
where K = cost + / cost
is the cost ratio.
Search WWH ::




Custom Search