Information Technology Reference
In-Depth Information
methods include random and focused under/oversampling methods and synthetic
data generation methods such as the synthetic minority oversampling technique
(SMOTE) [11]. Resampling methods have been successfully applied to train
SVMs with imbalanced datasets in different domains [10-16].
Especially, Batuwita and Palade [17] present an efficient focused oversam-
pling method for SVMs. In this method, first the separating hyperplane found
by training an SVM model on the original imbalanced dataset is used to select
the most informative examples for a given classification problem, which are the
data points lying around the class boundary region. Then, only these selected
examples are balanced by oversampling as opposed to blindly oversampling the
complete dataset. This method reduces the SVM training time significantly while
obtaining comparable classification results to the original oversampling method.
Support cluster machines (SCMs) method presented in [18] can be viewed
as another focused resampling method for SVMs. This method first partitions
the negative examples into disjoint clusters using the kernel- k -means clustering
method. Then, it trains an initial SVM model using the positive examples and the
representatives of the negative clusters, namely the data examples representing
the cluster centers. With the global picture of the initial SVMs, it approximately
identifies the support vectors and nonsupport vectors. Then, a shrinking technique
is used to remove the samples that are most probably not support vectors. This
procedure of clustering and shrinking is performed iteratively several times until
convergence.
5.4.2 Ensemble Learning Methods
Ensemble learning has also been applied as a solution for training SVMs with
imbalanced datasets [19-22]. Generally, in these methods, the majority class
dataset is divided into multiple subdatasets such that each of these sub-datasets
has a similar number of examples as the minority class dataset. This can be done
by random sampling with or without replacement (bootstrapping), or through
clustering methods. Then, a set of SVM classifiers is developed, so that each
one is trained with the same positive dataset and a different negative sub-dataset.
Finally, the decisions made by the classifier ensemble are combined by using a
method such as majority voting. In addition, special boosting algorithms, such
as Adacost [23], RareBoost [24], and SMOTEBoost [25], which have been used
in class imbalance learning with ensemble settings, could also be applied with
SVMs.
5.5 INTERNAL IMBALANCE LEARNING METHODS FOR SVMs:
ALGORITHMIC METHODS
In this section, we present the algorithmic modifications proposed in the literature
to make the SVM algorithm less sensitive to class imbalance.
Search WWH ::




Custom Search