Information Technology Reference
In-Depth Information
set is available, the computational properties of these algorithms can be leveraged
for faster classification and incremental learning. Online learning techniques can
process new data presented one at a time, as a result of either AL or random
selection, and can integrate the information of the new data to the system without
training on all previously seen data, thereby allowing models to be constructed
incrementally. This working principle of online learning algorithms leads to speed
improvements and a reduced memory footprint, making the algorithm applica-
ble to very large datasets. More importantly, this incremental learning principle
suits the nature of AL much more naturally than the batch algorithms. Empiri-
cal evidence indicates that a single presentation of each training example to the
algorithm is sufficient to achieve training errors comparable to those achieved by
the best minimization of the SVM objective [24].
6.3.4 Performance Metrics
Classification accuracy is not a good metric to evaluate classifiers in applications
facing class imbalance problems. SVMs have to achieve a trade-off between
maximizing the margin and minimizing the empirical error. In the non-separable
case, if the misclassification penalty C is very small, the SVM learner simply
tends to classify every example as negative. This extreme approach maximizes
the margin while making no classification errors on the negative instances. The
only error is the cumulative error of the positive instances that are already few
in numbers. Considering an imbalance ratio of 99 to 1, a classifier that classifies
everything as negative, will be 99% accurate. Obviously, such a scheme would
not have any practical use, as it would be unable to identify positive instances.
For the evaluation of these results, it is useful to consider several other
prediction performance metrics such as g -means, area under the curve
(AUC), and precision-recall break-even point (PRBEP), which are com-
monly used in imbalanced data classification. g -Means [28] is denoted as
g = sensitivity ยท specificity, where sensitivity is the accuracy on the positive
instances given as TruePos ./( TruePos . +
FalseNeg .) , and specificity is the
accuracy on the negative instances given as TrueNeg ./( TrueNeg . +
FalsePos .) .
The receiver operating curve (ROC) displays the relationship between sensi-
tivity and specificity at all possible thresholds for a binary classification scoring
model, when applied to independent test data. In other words, ROC curve is a plot
of the true positive rate against the false positive rate as the decision threshold is
changed. The area under the ROC (AUROC or AUC) is a numerical measure of
a model's discrimination performance and shows how successfully and correctly
the model ranks and thereby separates the positive and negative observations.
Since the AUC metric evaluates the classifier across the entire range of deci-
sion thresholds, it gives a good overview of the performance when the operating
condition for the classifier is unknown or the classifier is expected to be used in
situations with significantly different class distributions.
PRBEP is another commonly used performance metric for imbalanced data
classification. PRBEP is the accuracy of the positive class at
the threshold
Search WWH ::




Custom Search