Information Technology Reference
In-Depth Information
hypothesis is qualified enough to be accommodated at the expense of remov-
ing an existing hypothesis. Directly targeted at making one's choice between
the new and old data, Fan [16] examined the necessity of referring to the old
data. If it is unnecessary, reserving the most recent data would suffice to yield
a hypothesis with satisfactory performance. Otherwise, cross-validation will be
applied to locate the portion of old data that is most helpful in complementing
the most recent data for building an optimal hypothesis. The potential problem
in this approach is the choice of granularity for cross-validation. Finer granular-
ity would more accurately provide the desirable portion of the old data. This,
however, comes with extra overhead. When granularity is tuned fine enough to
the scale of single example, cross-validation would degenerate itself into a brute
force method, which may exhibit intractability for applications sensitive to speed.
Other ways of countering concept drift include the sliding window method [17],
which maintains a sliding window with either fixed or adaptively adjustable size
to determine the timeframe of the knowledge that should be reserved, and the
fading factor method [18], which assigns a time-decaying factor (usually in the
form of inverse exponential) to each hypothesis built over time. In this manner,
old knowledge would gradually be obsoleted and could be removed when the
corresponding factor downgrades to below the threshold.
Despite the popularity of the data stream study, learning from nonstationary
data streams with imbalanced class distribution is a relatively uncharted area in
which the difficulty lies in the context. In static context, the counterpart of this
problem is recognized as “imbalanced learning,” which corresponds to domains
where certain types of data distribution over-dominate the instance space com-
pared to other data distributions [19]. It is an evolving area and has attracted
significant attention in the community [20-24]. However, the solutions become
rather limited when imbalanced learning is set in the context of data streams. A
rather straightforward way is to apply off-the-shelve imbalanced learning meth-
ods to over-sample the minority class examples in each data chunk. Following
this idea, Ditzler and Chawla [25] used the synthetic minority over-sampling
technique (SMOTE) [21] to create synthetic minority class instances in each data
chunk arriving over time, and then applied the typical Learn ++ framework [8] to
learn from the balanced data chunks. A different way to compensate the imbal-
anced class ratio within each data chunk is to directly introduce the previous
minority class examples into the current training data chunk. In Literature [26],
all previous minority class examples are accommodated into the current training
data chunk, upon which an ensemble of hypotheses is then built to make predic-
tions on the datasets under evaluation. Considering the evolution of class concepts
over time, an obvious heuristic to improve this idea is to only accommodate pre-
vious minority class examples that are most similar to the minority class set
in the current training data chunk. Selectively recursive approach (SERA) [27],
multiple SERA (MuSeRA) [28], and reconstruct-evolve-average (REA) [29] all
stem from this idea; however, they differ in their ways of creating single or
ensemble hypothesis as well as how to measure the similarity between previous
and current minority class examples.
Search WWH ::




Custom Search