IMBALANCED DATASETS: FROM SAMPLING TO CLASSIFIERS - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

3.3.1 Cost-Sensitive Learning

In cost-sensitive learning instead of each instance being either correctly or

incorrectly classified, each class (or instance) is given a misclassification cost .

Thus, instead of trying to optimize the accuracy, the problem is then to minimize

the total misclassification cost. Many traditional methods are easily extensible

with this goal in mind. Support vector machines (SVMs), for instance, can be

easily modified to follow the cost-sensitive framework.

To see how this is done, consider the problem of learning an SVM. SVMs

attempt to learn a weight vector w that satisfies the following optimization prob-

lem:

1

2 ||

ξ i

n

2

min

w,ξ,b

w

||

+

C

(3.1)

i

=

1

subject to (for all i ∈{ 1 ,...,n } ):

y i ( w · x i − b) ≥ 1 − ξ i

(3.2)

ξ i ≥ 0

(3.3)

where x i denotes the instance i , y i denotes the class of instance i , ξ i denotes the

“slack,” for instance, i (i.e., how badly misclassified, if at all, instance i is by

w ), and C determines the trade-off between training error and model complexity.

In order to make SVMs cost sensitive, the objective function is modified

resulting in:

1

2 || w ||

c i ξ i ,

n

2

min

w,ξ,b

+ C

(3.4)

i

=

1

where c i is the misclassification cost, for instance i .

In addition to modifying traditional learning algorithms, cost-sensitive ensem-

ble methods have also been developed. AdaCost [27] and cost-sensitive boosting

(CSB) [28] are two extensions of AdaBoost, which incorporate the misclassifica-

tion cost of an instance in order to provide more accurate instance weights and

not just weights derived from misclassification error as done by AdaBoost.

3.3.2 Skew-Insensitive Learners

While cost-sensitive learning is a popular way of extending classifiers for the use

on the class imbalance problem, some classifiers allow for easier extensions that

directly combat the problem of class imbalance.

Naıve Bayes, for instance, is trivially skew insensitive. This is due to the fact

that it makes predictions p(y | x i ) by first computing p( x i | y) and p(y) from the

training data. Bayes rule is then applied to obtain a classification, for instance,

x i as: p(y | x i ) = p( x i | y) · p(y) .Since p( x i | y) is difficult to calculate in general,

a simplifying assumption is made (the “naıve” in “naıve Bayes”), namely, each

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home