FOUNDATIONS OF IMBALANCED LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

this lack of knowledge, where “robust” means that the metrics yield good results

over a wide variety of assumptions. If these metrics are to be useful for learning

from imbalanced datasets, they will tend to value the minority class much more

than accuracy, which is now widely recognized as a poor metric when learning

from imbalanced data. This recognition has led to the ascension of new metrics

to replace accuracy for learning from unbalanced data.

A variety of metrics are routinely used when learning from imbalanced data

when accurate evaluation information is not available. The most common metric

involves receiver operation characteristics (ROC) analysis, and the area under

the ROC curve (AUC) [17, 18]. ROC analysis can sometimes identify optimal

models and discard suboptimal ones independent of the cost context or the class

distribution (i.e., if one ROC curve dominates another), although in practice ROC

curves tend to intersect, so that there is no one dominant model. ROC analysis

does not have any bias toward models that perform well on the majority class at

the expense of the majority class — a property that is quite attractive when dealing

with imbalanced data. AUC summarizes this information into a single number,

which facilitates model comparison when there is no dominating ROC curve.

Recently, there has been some criticism concerning the use of ROC analysis for

model comparison [19], but nonetheless this measure is still the most common

metric used for learning from imbalanced data.

Other common metrics used for imbalanced learning are based on precision

and recall. The precision of classification rules is essentially the accuracy asso-

ciated with those rules, while the recall of a set of rules (or a classifier) is

the percentage of examples of a designated class that are correctly predicted.

For imbalanced learning, recall is typically used to measure the coverage of the

minority class. Thus, precision and recall make it possible to assess the perfor-

mance of a classifier on the minority class. Typically, one generates precision

and recall curves by considering alternative classifiers. Just like AUC is used for

model comparison for ROC analysis, there are metrics that combine precision

and recall into a single number to facilitate comparisons between models. These

include the geometric mean (the square root of precision times recall) and the

F -measure [20]. The F -measure is parameterized and can be adjusted to spec-

ify the relative importance of precision versus recall, but the F1-measure, which

weights precision and recall equally, is the variant most often used when learning

from imbalanced data.

It is also important to use appropriate evaluation metrics for unsupervised

learning tasks that must handle imbalanced data. As described earlier, associa-

tion rule mining treats all items equally even though rare items are often more

important than common ones. Various evaluation metrics have been proposed to

deal with this imbalance and algorithms have been developed to mine associa-

tion rules that satisfy these metrics. One simple metric assigns uniform weights to

each item to represent its importance, perhaps its per-unit profit [21]. A slightly

more sophisticated metric allows this weight to vary based on the transaction it

appears in, which can be used to reflect the quantity of the item [22, 23]. But

such measures still cannot represent simple metrics such as total profit. Utility

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home