ASSESSMENT METRICS FOR IMBALANCED LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

focus, as also seen in the study by Ferri et al. [3], because the varying degree of

importance on the different classes is not considered, performance metrics in this

category do not fare very well in the class-imbalanced situation unless the class

ratio is specifically taken into consideration. Single-class focus metrics, on the

other hand, can be more sensitive to the issue of the varying degree of importance

placed on the different classes and, as a result, be naturally better suited to

evaluation in class-imbalanced domains. The single-class focus measures that are

discussed in this section are: sensitivity/specificity, precision/recall, Geometric

mean (G-mean), and F -measure. In addition to single-class focus metrics, we

will discuss the multi-class focus metrics that take class ratios into consideration

as a way to mitigate the contribution of the components on the overall results.

We will also present a survey of more experimental metrics that were recently

proposed but have not yet enjoyed much exposure in the community.

All the metrics discussed in this section are based on the concept of the

confusion matrix. The confusion matrix for classifier f records the number of

examples of each class that were correctly classified as belonging to that class

by classifier f , as well as the number of examples of each class that were

misclassified. For the misclassified examples, the confusion matrix considers all

kinds of misclassification possible and records the number of examples that fall in

each category. For example, if we consider a three-class problem, the following

confusion matrix tells us that a examples of class A, e examples of class B, and

i examples of class C were correctly classified by f . However, b+c examples of

class A were wrongly classified by f , b of which were mistakenly assigned to

Class B, and c of which were mistakenly assigned to class C (and similarly for

classes B and C).

Predicted class A Predicted class B Predicted class C

Actual class A

a

b

c

Actual class B

d

e

f

Actual class C

g

h

i

In the binary class case, the above-mentioned matrix is reduced to a 2 ×

2 format, and the issue of which class a misclassified example is assigned to

disappears, as there remains only one possibility. In such a case, specific names

are given to both the classes (positive and negative) and to the entries of the

confusion matrix (true positive, false negative, false positive, and true negative)

as shown in the following:

Predicted positive

Predicted negative

Actual positive

True positive (TP)

False negative (FN)

Actual negative

False positive (FP)

True negative (TN)

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home