NONSTATIONARY STREAM DATA LEARNING WITH IMBALANCED CLASS DISTRIBUTION - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

hypothesis is qualified enough to be accommodated at the expense of remov-

ing an existing hypothesis. Directly targeted at making one's choice between

the new and old data, Fan [16] examined the necessity of referring to the old

data. If it is unnecessary, reserving the most recent data would suffice to yield

a hypothesis with satisfactory performance. Otherwise, cross-validation will be

applied to locate the portion of old data that is most helpful in complementing

the most recent data for building an optimal hypothesis. The potential problem

in this approach is the choice of granularity for cross-validation. Finer granular-

ity would more accurately provide the desirable portion of the old data. This,

however, comes with extra overhead. When granularity is tuned fine enough to

the scale of single example, cross-validation would degenerate itself into a brute

force method, which may exhibit intractability for applications sensitive to speed.

Other ways of countering concept drift include the sliding window method [17],

which maintains a sliding window with either fixed or adaptively adjustable size

to determine the timeframe of the knowledge that should be reserved, and the

fading factor method [18], which assigns a time-decaying factor (usually in the

form of inverse exponential) to each hypothesis built over time. In this manner,

old knowledge would gradually be obsoleted and could be removed when the

corresponding factor downgrades to below the threshold.

Despite the popularity of the data stream study, learning from nonstationary

data streams with imbalanced class distribution is a relatively uncharted area in

which the difficulty lies in the context. In static context, the counterpart of this

problem is recognized as “imbalanced learning,” which corresponds to domains

where certain types of data distribution over-dominate the instance space com-

pared to other data distributions [19]. It is an evolving area and has attracted

significant attention in the community [20-24]. However, the solutions become

rather limited when imbalanced learning is set in the context of data streams. A

rather straightforward way is to apply off-the-shelve imbalanced learning meth-

ods to over-sample the minority class examples in each data chunk. Following

this idea, Ditzler and Chawla [25] used the synthetic minority over-sampling

technique (SMOTE) [21] to create synthetic minority class instances in each data

chunk arriving over time, and then applied the typical Learn ++ framework [8] to

learn from the balanced data chunks. A different way to compensate the imbal-

anced class ratio within each data chunk is to directly introduce the previous

minority class examples into the current training data chunk. In Literature [26],

all previous minority class examples are accommodated into the current training

data chunk, upon which an ensemble of hypotheses is then built to make predic-

tions on the datasets under evaluation. Considering the evolution of class concepts

over time, an obvious heuristic to improve this idea is to only accommodate pre-

vious minority class examples that are most similar to the minority class set

in the current training data chunk. Selectively recursive approach (SERA) [27],

multiple SERA (MuSeRA) [28], and reconstruct-evolve-average (REA) [29] all

stem from this idea; however, they differ in their ways of creating single or

ensemble hypothesis as well as how to measure the similarity between previous

and current minority class examples.

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home