CLASS IMBALANCE LEARNING METHODS FOR SUPPORT VECTOR MACHINES - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

methods include random and focused under/oversampling methods and synthetic

data generation methods such as the synthetic minority oversampling technique

(SMOTE) [11]. Resampling methods have been successfully applied to train

SVMs with imbalanced datasets in different domains [10-16].

Especially, Batuwita and Palade [17] present an efficient focused oversam-

pling method for SVMs. In this method, first the separating hyperplane found

by training an SVM model on the original imbalanced dataset is used to select

the most informative examples for a given classification problem, which are the

data points lying around the class boundary region. Then, only these selected

examples are balanced by oversampling as opposed to blindly oversampling the

complete dataset. This method reduces the SVM training time significantly while

obtaining comparable classification results to the original oversampling method.

Support cluster machines (SCMs) method presented in [18] can be viewed

as another focused resampling method for SVMs. This method first partitions

the negative examples into disjoint clusters using the kernel- k -means clustering

method. Then, it trains an initial SVM model using the positive examples and the

representatives of the negative clusters, namely the data examples representing

the cluster centers. With the global picture of the initial SVMs, it approximately

identifies the support vectors and nonsupport vectors. Then, a shrinking technique

is used to remove the samples that are most probably not support vectors. This

procedure of clustering and shrinking is performed iteratively several times until

convergence.

5.4.2 Ensemble Learning Methods

Ensemble learning has also been applied as a solution for training SVMs with

imbalanced datasets [19-22]. Generally, in these methods, the majority class

dataset is divided into multiple subdatasets such that each of these sub-datasets

has a similar number of examples as the minority class dataset. This can be done

by random sampling with or without replacement (bootstrapping), or through

clustering methods. Then, a set of SVM classifiers is developed, so that each

one is trained with the same positive dataset and a different negative sub-dataset.

Finally, the decisions made by the classifier ensemble are combined by using a

method such as majority voting. In addition, special boosting algorithms, such

as Adacost [23], RareBoost [24], and SMOTEBoost [25], which have been used

in class imbalance learning with ensemble settings, could also be applied with

SVMs.

5.5 INTERNAL IMBALANCE LEARNING METHODS FOR SVMs:

ALGORITHMIC METHODS

In this section, we present the algorithmic modifications proposed in the literature

to make the SVM algorithm less sensitive to class imbalance.

Search WWH ::

Custom Search

Home