INTRODUCTION - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

1.2 STATE-OF-THE-ART RESEARCH

Given the new challenges facing imbalanced learning, extensive efforts and sig-

nificant progress have been made in the community to tackle this problem. In

this section, we provide a brief summary of the major category of approaches

for imbalanced learning. Our goal is just to highlight some of the major research

methodologies while directing the readers to different chapters in this topic for the

latest research development in each category of approach. Furthermore, a com-

prehensive summary and critical review of various types of imbalanced learning

techniques can also be found in a recent survey [1].

1.2.1 Sampling Methods

Sampling methods seem to be the dominate type of approach in the community

as they tackle imbalanced learning in a straightforward manner. In general, the

use of sampling methods in imbalanced learning consists of the modification

of an imbalanced dataset by some mechanism in order to provide a balanced

distribution. Representative work in this area includes random oversampling [9],

random undersampling [10], synthetic sampling with data generation [5, 11-13],

cluster-based sampling methods [14], and integration of sampling and boosting

[6, 15, 16].

The key aspect of sampling methods is the mechanism used to sample the

original dataset. Under different assumptions and with different objective consid-

erations, various approaches have been proposed. For instance, the mechanism

of random oversampling follows naturally from its description by replicating

a randomly selected set of examples from the minority class. On the basis of

such simple sampling techniques, many informed sampling methods have been

proposed, such as the EasyEnsemble and BalanceCascade algorithms [17]. Syn-

thetic sampling with data generation techniques has also attracted much attention.

For example, the synthetic minority oversampling technique (SMOTE) algorithm

creates artificial data based on the feature space similarities between existing

minority examples [5]. Adaptive sampling methods have also been proposed, such

as the borderline-SMOTE [11] and adaptive synthetic (ADASYN) sampling [12]

algorithms. Sampling strategies have also been integrated with ensemble learn-

ing techniques by the community, such as in SMOTEBoost [15], RAMOBoost

[18], and DataBoost-IM [6]. Data-cleaning techniques, such as Tomek links [19],

have been effectively applied to remove the overlapping that is introduced from

sampling methods for imbalanced learning. Some representative work in this area

includes the one-side selection (OSS) method [13] and the neighborhood cleaning

rule (NCL) [20].

1.2.2 Cost-Sensitive Methods

Cost-sensitive learning methods target the problem of imbalanced learning by

using different cost matrices that describe the costs for misclassifying any

Search WWH ::

Custom Search

Home