Information Technology Reference
In-Depth Information
Similar to ROC curves are being compared on the basis of AUROC, PR
curves are also compared on the basis of AUPR. This practice has become more
common, as recent research suggests that PR curves (and AUPR) are a better
discriminator of performance than their ROC (and AUROC) counterparts [38].
3.4.4 F β -Measure
A final common metric is the F β -measure. F β -measure is a family of metrics
that attempts to measure the trade-offs between precision and recall by outputting
a single value that reflects the goodness of a classifier in the presence of rare
classes. While ROC curves represent the trade-off between different TPRs and
FPRs, F β -measure represents the trade-off among different values of TP, FP, and
FN [37].
The general equation for F β -measure is:
precision
·
recall
+ β 2 ) ·
F β = ( 1
· precision ) + recall ,
(3.11)
2
where β represents the relative importance of precision and recall. Traditionally,
when β is not specified, the F 1 -measure is assumed.
In spite of its (relatively) useful properties for imbalance, F β -measure is not
commonly used when comparing classifiers, as AUROC and AUPR provide more
robust and better performance estimates.
3.5 DISCUSSION
In this chapter, we covered various strategies for learning in imbalanced environ-
ments. Specifically, we discussed various sampling strategies and skew- insensi-
tive classifiers.
One key observation when attempting to choose between a sampling method
and a skew-insensitive classifier is that while sampling methods are a widely
applied standard, they require the tuning of parameters to select the proper sam-
pling level for a given dataset. In general, this is a difficult optimization problem
and may prove impractical in practice depending on the size of the dataset and
level of imbalance. In such cases, skew-insensitive classifiers (and ensembles
built of skew-insensitive classifiers) can provide a reasonable alternative that
provides similar (and often better) performance as that of the sampling methods.
When attempting to evaluate the performance of the aforementioned models,
we learned that accuracy is not a valuable evaluation metric when learning in
imbalanced environments. The lack of utility of accuracy (and error rate) is due
to the fact that they overemphasize the performance of the majority class at the
detriment to the considerations of the performance of the minority class. In order
to overcome this issue, we presented multiple alternative evaluation metrics. The
most commonly used alternatives discussed were AUROC and AUPR.
Search WWH ::




Custom Search