Information Technology Reference
In-Depth Information
no other line with the same slope passing through another point with a larger
TPR intercept. Thus, the classifier at that point is optimal under any distribution
assumptions along with the slope [34].
While ROC curves provide a visual method for determining the effectiveness
of a classifier, the area under the ROC curve (AUROC) has become the de facto
standard metric for evaluating classifiers under imbalance [35]. This is due to the
fact that it is both independent of the selected threshold and prior probabilities, as
well as offering a single number to compare classifiers. One of the main benefits
of AUROC is that it can be considered as measuring how often a random positive
class instance is ranked above a random negative class instance when sorted by
their classification probabilities.
One way of computing AUROC is, given n 0 points of class 0, n 1 points of
class 1, and S 0 as the sum of ranks of class 0 examples [36]:
2 S 0 n 0 (n 0 + 1 )
2 n 0 n 1
AUROC
=
(3.8)
3.4.3 Precision and Recall
Alternatives to AUROC are precision and recall. Precision and recall can be
computed from the confusion matrix (Fig. 3.1) as [37]:
TP
precision
=
(3.9)
TP
+
FP
TP
recall =
(3.10)
TP
+
FN
From the equations, we see that precision measures how often an instance
that was predicted as positive that is actually positive, while recall measures how
often a positive class instance in the dataset was predicted as a positive class
instance by the classifier.
In imbalanced datasets, the goal is to improve recall without hurting precision.
These goals, however, are often conflicting, since in order to increase the TP for
the minority class, the number of FP is also often increased, resulting in reduced
precision.
In order to obtain a more accurate understanding of the trade-offs between
precision and recall, one can use precision-recall (PR) curves. PR curves are
similar to ROC curves in that they provide a graphical representation of the
performance of classifiers. While the X -axis of ROC curves is FPR and the
Y -axis is TPR, in PR curves, the X -axis is recall and the Y -axis is precision.
Precision-recall curves are therefore similar to ROC curves as recall is the same
as FPR; however, the Y -axes are different. While TPR measures the fraction of
positive examples that are correctly classified, precision measures the fraction of
examples that are classified as positive that are actually positive.
Search WWH ::




Custom Search