Information Technology Reference
In-Depth Information
that is, SMOTE / Learn ++ and the others using previous minority class examples
UB / SERA / MuSeRA / REA. The second category can be further divided into the
ones using all previous minority class examples and the others using partial
previous minority class examples. The key idea of using partial previous minority
class examples to compensate imbalanced class ratio is to find a measurement
to calculate the similarity degree between each previous minority class example
and the current minority class set, which splits the algorithms in this subcategory
into the ones using Mahalanobis distance and the others using k -nearest neighbor.
This finally ends the taxonomy of algorithms.
Algorithms are introduced by referring to their algorithmic pseudo-codes, fol-
lowed by the theoretical study and simulations on them. The results show that
REA seems to be able to provide the most competitive results over other algo-
rithms under comparison. Nevertheless, considering the efficiency in practice for
stream data-based applications, SERA may provide the best trade-off between
performance and algorithm complexity.
There are many interesting directions that can be followed up to pursue the
study of learning from nonstationary stream with imbalanced class distribution.
First of all, for REA / MuSeRA algorithms, an efficient and concrete mechanism
is desired to enable them to remove the hypotheses with obsoleted knowledge
on the fly to account for limited resources availability as well as concept drifts.
For instance, one can explore integrating the method using Learn ++ to prune
the hypothesis into REA / MuSeRA. Second, the issue of compensating the imbal-
anced class ratio can be worked against the other way around. In other words, one
can choose to remove less important majority class examples instead of explic-
itly increasing the minority class data. The effect would be the same, and the
benefit is obvious, which is the avoidance of accommodating synthetic/previous
data into the data chunk to potentially impair the integrity of target concept.
The random under-sampling method employed by UB could be considered as
a kind of preliminary effort to implement this. Finally, there seems to be no
record indicating the usage of the cost-sensitive learning framework to address the
problem. One could directly assign different misclassification costs to minority
class and majority class examples during training in the hope for better learn-
ing performance. Aside from this naive implementation, a smarter way is to
assign different misclassification costs to minority class examples, majority class
examples, and previous minority class examples . The misclassification costs for
previous minority class examples can even be set nonuniformly according to
how similar they are with the minority class set in the training data chunk under
consideration.
7.6 ACKNOWLEDGMENTS
This work was supported in part by the National Science Foundation (NSF) under
grant ECCS 1053717 and Army Research Office (ARO) under Grant W911NF-
12-1-0378.
Search WWH ::




Custom Search