Information Technology Reference
In-Depth Information
NONSTATIONARY STREAM DATA
LEARNING WITH IMBALANCED
CLASS DISTRIBUTION
SHENG CHEN
Merrill Lynch, Bank of America, New York, NY, USA
HAIBO HE
Department of Electrical, Computer, and Biomedical Engineering, University of Rhode
Island, Kingston, RI, USA
Abstract: The ubiquitous imbalanced class distribution occurring in real-world
datasets has stirred considerable interest in the study of imbalanced learning . How-
ever, it is still a relatively uncharted area when it is a nonstationary data stream
with imbalanced class distribution that needs to be processed. Difficulties in this
case are generally twofold. First, a dynamically structured learning framework is
required to catch up with the evolution of unstable class concepts, that is, con-
cept drifts. Second, an imbalanced class distribution over data streams demands a
mechanism to intensify the underrepresented class concepts for improved overall
performance. For instance, in order to design an intelligent spam filtering system,
one needs to make a system that can self-tune its learning parameters to keep
pace with the rapid evolution of spam mail patterns and tackle the fundamental
problem of normal emails being severely outnumbered by spam emails in some
situations; yet it is so much more expensive to misclassify a normal email as spam,
for example, confirmation of a business contract, than the other way around. This
chapter introduces learning algorithms that were specifically proposed to tackle the
problem of learning from nonstationary datasets with imbalanced class distribu-
tion. System-level principles and a framework of these methods are described at an
algorithmic level, the soundness of which is further validated through theoretical
analysis as well as simulations on both synthetic and real-world benchmarks with
varied levels of imbalanced ratio and noise.
Search WWH ::




Custom Search