Information Technology Reference
In-Depth Information
1.2 STATE-OF-THE-ART RESEARCH
Given the new challenges facing imbalanced learning, extensive efforts and sig-
nificant progress have been made in the community to tackle this problem. In
this section, we provide a brief summary of the major category of approaches
for imbalanced learning. Our goal is just to highlight some of the major research
methodologies while directing the readers to different chapters in this topic for the
latest research development in each category of approach. Furthermore, a com-
prehensive summary and critical review of various types of imbalanced learning
techniques can also be found in a recent survey [1].
1.2.1 Sampling Methods
Sampling methods seem to be the dominate type of approach in the community
as they tackle imbalanced learning in a straightforward manner. In general, the
use of sampling methods in imbalanced learning consists of the modification
of an imbalanced dataset by some mechanism in order to provide a balanced
distribution. Representative work in this area includes random oversampling [9],
random undersampling [10], synthetic sampling with data generation [5, 11-13],
cluster-based sampling methods [14], and integration of sampling and boosting
[6, 15, 16].
The key aspect of sampling methods is the mechanism used to sample the
original dataset. Under different assumptions and with different objective consid-
erations, various approaches have been proposed. For instance, the mechanism
of random oversampling follows naturally from its description by replicating
a randomly selected set of examples from the minority class. On the basis of
such simple sampling techniques, many informed sampling methods have been
proposed, such as the EasyEnsemble and BalanceCascade algorithms [17]. Syn-
thetic sampling with data generation techniques has also attracted much attention.
For example, the synthetic minority oversampling technique (SMOTE) algorithm
creates artificial data based on the feature space similarities between existing
minority examples [5]. Adaptive sampling methods have also been proposed, such
as the borderline-SMOTE [11] and adaptive synthetic (ADASYN) sampling [12]
algorithms. Sampling strategies have also been integrated with ensemble learn-
ing techniques by the community, such as in SMOTEBoost [15], RAMOBoost
[18], and DataBoost-IM [6]. Data-cleaning techniques, such as Tomek links [19],
have been effectively applied to remove the overlapping that is introduced from
sampling methods for imbalanced learning. Some representative work in this area
includes the one-side selection (OSS) method [13] and the neighborhood cleaning
rule (NCL) [20].
1.2.2 Cost-Sensitive Methods
Cost-sensitive learning methods target the problem of imbalanced learning by
using different cost matrices that describe the costs for misclassifying any
Search WWH ::




Custom Search