Information Technology Reference
In-Depth Information
curves, in order to compare two classifiers, both the class distribution and mis-
classification costs must be known or estimated. They argue that misclassification
costs are often difficult to estimate. B-ROC allows them to bypass the issue of
misclassification cost estimation altogether.
8.5 CONCLUSION
The purpose of this chapter was to discuss the issue of classifier evaluation
in the case of class-imbalanced datasets. The discussion focused on evaluation
metrics or on graphical methods commonly used or more recently proposed and
less known. In particular, it looked at the following well-known single-class focus
metrics belonging to the threshold category of metrics: sensitivity and specificity,
precision and recall, and their combinations: the G-mean and the F -measure,
respectively. It then considered a multi-class focus threshold metric that combines
the partial accuracies of both classes, the macro-average accuracy. In the threshold
metric category, still, other new and more experimental combination metrics were
surveyed: mean-class-weighted accuracy, optimized precision, the AGms, and the
IBA. The chapter then discussed the following methods and metrics belonging
to the ranking category of methods and metrics: ROC analysis, cost curves, PR
curves, and AUC. Still in the ranking category, the chapter discussed newer
and more experimental methods and metrics: the H -measure, Area under the
Precision-Recall Curve and Integrated Area under the ROC Curve, and B-ROC
analysis.
Mainly because there has been very little discussion on other aspects of clas-
sifier evaluation in the context of class imbalances than evaluation metrics and
graphical methods, this chapter did not delve into the issues of error estimation
and statistical tests, which are important aspects of the evaluation process as
well. We will now say a few words about these issues that should warrant much
greater attention in the future.
In the case of error estimation, the main method used in the machine learning
community is 10-fold cross-validation. As mentioned in the introduction, it is
clear that when such a method is used, because of the random partitioning of
the data into 10 subsets, the results may be erroneous because some of the
subsets may not contain many or even any instances of the minority class if the
dataset is extremely imbalanced or very small. This is well documented, and it
is common, in the case of class imbalances in particular, to use stratified 10-
fold cross-validation, which ensures that the proportion of positive to negative
examples found in the original distribution is respected in all the folds (See [1]
for a description of this approach). Going beyond this simple issue, however, both
Japkowicz and Shah [1] and Raeder et al. [2] discuss the issue of error estimation
in the general case. They both conclude that while 10-fold cross-validation is
quite reliable in most cases, more research is necessary in order to establish ideal
re-sampling strategies at all times. Raeder et al. [2] specifically mention the class
imbalance situation where they suggest that additional experiments are needed
Search WWH ::




Custom Search