CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

set is available, the computational properties of these algorithms can be leveraged

for faster classification and incremental learning. Online learning techniques can

process new data presented one at a time, as a result of either AL or random

selection, and can integrate the information of the new data to the system without

training on all previously seen data, thereby allowing models to be constructed

incrementally. This working principle of online learning algorithms leads to speed

improvements and a reduced memory footprint, making the algorithm applica-

ble to very large datasets. More importantly, this incremental learning principle

suits the nature of AL much more naturally than the batch algorithms. Empiri-

cal evidence indicates that a single presentation of each training example to the

algorithm is sufficient to achieve training errors comparable to those achieved by

the best minimization of the SVM objective [24].

6.3.4 Performance Metrics

Classification accuracy is not a good metric to evaluate classifiers in applications

facing class imbalance problems. SVMs have to achieve a trade-off between

maximizing the margin and minimizing the empirical error. In the non-separable

case, if the misclassification penalty C is very small, the SVM learner simply

tends to classify every example as negative. This extreme approach maximizes

the margin while making no classification errors on the negative instances. The

only error is the cumulative error of the positive instances that are already few

in numbers. Considering an imbalance ratio of 99 to 1, a classifier that classifies

everything as negative, will be 99% accurate. Obviously, such a scheme would

not have any practical use, as it would be unable to identify positive instances.

For the evaluation of these results, it is useful to consider several other

prediction performance metrics such as g -means, area under the curve

(AUC), and precision-recall break-even point (PRBEP), which are com-

monly used in imbalanced data classification. g -Means [28] is denoted as

g = sensitivity · specificity, where sensitivity is the accuracy on the positive

instances given as TruePos ./( TruePos . +

FalseNeg .) , and specificity is the

accuracy on the negative instances given as TrueNeg ./( TrueNeg . +

FalsePos .) .

The receiver operating curve (ROC) displays the relationship between sensi-

tivity and specificity at all possible thresholds for a binary classification scoring

model, when applied to independent test data. In other words, ROC curve is a plot

of the true positive rate against the false positive rate as the decision threshold is

changed. The area under the ROC (AUROC or AUC) is a numerical measure of

a model's discrimination performance and shows how successfully and correctly

the model ranks and thereby separates the positive and negative observations.

Since the AUC metric evaluates the classifier across the entire range of deci-

sion thresholds, it gives a good overview of the performance when the operating

condition for the classifier is unknown or the classifier is expected to be used in

situations with significantly different class distributions.

PRBEP is another commonly used performance metric for imbalanced data

classification. PRBEP is the accuracy of the positive class at

the threshold

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home