Information Technology Reference
In-Depth Information
8.2 A REVIEW OF EVALUATION METRIC FAMILIES AND THEIR
APPLICABILITY TO THE CLASS IMBALANCE PROBLEM
Several machine learning researchers have identified three families of evaluation
metrics used in the context of classification [3, 4]. These are the threshold metrics
(e.g., accuracy and F -measure), the ranking methods and metrics [e.g., receiver
operating characteristics (ROC) analysis and AUC), and the probabilistic metrics
(e.g., root-mean-squared error). The purpose of this section is to discuss the
advantages and disadvantages of these families with respect to the class imbalance
problem.
In assessing both the families and the specific metrics, one must keep in mind
the purpose of the evaluation process. For example, because Ferri et al. [3] con-
siders that decreasing the overall value of an algorithm's worth because of its poor
performance on one or a few infrequent classes is a nuisance, their conclusions
are opposite to those typically reached by researchers who focus on problems with
class imbalances or cost issues and for whom good performance on the majority
class is usually of lesser consequence than good performance on the rarer or
more important class. In this chapter, we are in agreement with the latter class of
researchers, and so, in contrast to Ferri et al. [3], we take the position that sensitiv-
ity to the misclassification of infrequent classes is an asset rather than a liability.
As mentioned in [5], many studies have documented the weakness of the most
notable threshold metric, accuracy, in comparison to ranking methods and met-
rics (most notably ROC analysis/AUC) in the case of class imbalances. The first
such study, which in fact brought ROC analysis to the attention of machine learn-
ing researchers, is the one by Provost and Fawcett [6]. The main thrust of their
article is that because the precise nature of the environment in which a machine
learning system will be deployed is unknown, arbitrarily setting the conditions
in which the system will be used is flawed. They argue that a typical measure
such as accuracy does just that because it assumes equal error costs and constant
and relatively balanced class priors, despite the fact that the actual conditions in
which the system will operate are unknown. Their argument applies to more gen-
eral cases than accuracy, but that is the metric they focus on in their discussion.
Instead of systematically using the accuracy, they propose an assessment based
on ROC analysis that does not make any assumptions about costs or priors, but
rather, can evaluate the performance of different learning systems under all possi-
ble cost and prior situations. In fact, they propose a hybrid learning system based
on the idea that suggests a different learner for each cost and prior condition.
A more recent study by Ferri et al. [3] expands this discussion to a large
number of metrics and situations. It is the largest scale study, to date, which
pits the different metric families and individual metrics against one another in
various contexts (including the class imbalance one). 1 Their study is twofold. In
1 A little earlier, Caruana and Niculescu-Mizil [4] also conducted a large-scale study of the different
families of metrics, but they did not consider the case of class imbalances. We, thus, do not discuss
their work here.
Search WWH ::




Custom Search