Information Technology Reference
In-Depth Information
this lack of knowledge, where “robust” means that the metrics yield good results
over a wide variety of assumptions. If these metrics are to be useful for learning
from imbalanced datasets, they will tend to value the minority class much more
than accuracy, which is now widely recognized as a poor metric when learning
from imbalanced data. This recognition has led to the ascension of new metrics
to replace accuracy for learning from unbalanced data.
A variety of metrics are routinely used when learning from imbalanced data
when accurate evaluation information is not available. The most common metric
involves receiver operation characteristics (ROC) analysis, and the area under
the ROC curve (AUC) [17, 18]. ROC analysis can sometimes identify optimal
models and discard suboptimal ones independent of the cost context or the class
distribution (i.e., if one ROC curve dominates another), although in practice ROC
curves tend to intersect, so that there is no one dominant model. ROC analysis
does not have any bias toward models that perform well on the majority class at
the expense of the majority class — a property that is quite attractive when dealing
with imbalanced data. AUC summarizes this information into a single number,
which facilitates model comparison when there is no dominating ROC curve.
Recently, there has been some criticism concerning the use of ROC analysis for
model comparison [19], but nonetheless this measure is still the most common
metric used for learning from imbalanced data.
Other common metrics used for imbalanced learning are based on precision
and recall. The precision of classification rules is essentially the accuracy asso-
ciated with those rules, while the recall of a set of rules (or a classifier) is
the percentage of examples of a designated class that are correctly predicted.
For imbalanced learning, recall is typically used to measure the coverage of the
minority class. Thus, precision and recall make it possible to assess the perfor-
mance of a classifier on the minority class. Typically, one generates precision
and recall curves by considering alternative classifiers. Just like AUC is used for
model comparison for ROC analysis, there are metrics that combine precision
and recall into a single number to facilitate comparisons between models. These
include the geometric mean (the square root of precision times recall) and the
F -measure [20]. The F -measure is parameterized and can be adjusted to spec-
ify the relative importance of precision versus recall, but the F1-measure, which
weights precision and recall equally, is the variant most often used when learning
from imbalanced data.
It is also important to use appropriate evaluation metrics for unsupervised
learning tasks that must handle imbalanced data. As described earlier, associa-
tion rule mining treats all items equally even though rare items are often more
important than common ones. Various evaluation metrics have been proposed to
deal with this imbalance and algorithms have been developed to mine associa-
tion rules that satisfy these metrics. One simple metric assigns uniform weights to
each item to represent its importance, perhaps its per-unit profit [21]. A slightly
more sophisticated metric allows this weight to vary based on the transaction it
appears in, which can be used to reflect the quantity of the item [22, 23]. But
such measures still cannot represent simple metrics such as total profit. Utility
Search WWH ::




Custom Search