INTRODUCTION - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

iterative step of active learning instead of querying the entire dataset. Active

learning integrations with sampling techniques have also been proposed. For

instance, Zhu and Hovy [54] analyzed the effect of undersampling and oversam-

pling techniques with active learning for the word sense disambiguation (WSD)

imbalanced learning problem. Another active learning sampling method is the

simple active learning heuristic (SALH) approach proposed in [55]. The main

aim of this method is to provide a generic model for the evolution of genetic pro-

gramming (GP) classifiers by integrating the stochastic subsampling method and

a modified Wilcoxon-Mann-Whitney (WMW) cost function [55]. Major advan-

tages of the SALH method include the ability to actively bias the data distribution

for learning, the existence of a robust cost function, and the improvement of the

computational cost related to the fitness evaluation.

1.2.5 One-Class Learning Methods

The one-class learning or novelty detection method has also attracted much

attention in the community for imbalanced learning [4]. Generally speaking,

this category of approaches aims to recognize instances of a concept by using

mainly, or only, a single class of examples (i.e., recognition-based methodol-

ogy) rather than differentiating between instances of both positive and negative

classes as in the conventional learning approaches (i.e., discrimination-based

inductive methodology). Representative work in this area includes the one-class

SVMs [56, 57] and the autoassociator (or autoencoder) method [58-60]. For

instance, in [59], a comparison between different sampling methods and the

one-class autoassociator method was presented. The novelty detection approach

based on redundancy compression and nonredundancy differentiation techniques

was investigated in [60]. Lee and Cho [61] suggested that novelty detection

methods are particularly useful for extremely imbalanced datasets, whereas regu-

lar discrimination-based inductive classifiers are suitable for relatively moderate

imbalanced datasets.

Although the current efforts in the community are focused on two-class imbal-

anced problems, multi-class imbalanced learning problems also exist and have

been investigated in numerous works. For instance, in [62], a cost-sensitive boost-

ing algorithm AdaC2.M1 was proposed to tackle the class imbalance problem

with multiple classes. In [63], an iterative method for multi-class cost-sensitive

learning was proposed. Other works of multi-class imbalanced learning include

the min-max modular network [64] and the rescaling approach for multi-class

cost-sensitive neural networks [65], to name a few.

Our discussions in this section by no means provide a full coverage of the

complete set of methods to tackle the imbalanced learning problem, given the

variety of assumptions for the imbalanced data and different learning objectives

of different applications. Interested readers can refer to [1] for a recent survey of

the imbalanced learning methods. The latest research development on this topic

can be found in the following chapters.

Search WWH ::

Custom Search

Home