Information Technology Reference
In-Depth Information
OA is usually adopted in the traditional learning scenario, that is, static datasets
with balanced class distribution, to evaluate the performance of algorithms. How-
ever, when the context changes to imbalanced learning, it is wise to apply other
metrics for such evaluation [19], among which receiver operation characteris-
tics (ROC) curve and area under ROC curve (AUROC) are the most strongly
recommended [36].
On the basis of the confusion matrix as defined in Figure 7.4, one can calculate
the TP rate and FP rate as follows:
TP
P R =
TP
TP + FN
TP rate
=
(7.32)
FP
N R =
FP
FP + TN
FP rate =
(7.33)
ROC space is established by plotting TP rate over FP rate. Generally speak-
ing, hard-type classifiers (those that output only discrete class labels) correspond
to points in ROC space (FP rate, TP rate). On the other hand, soft-type classi-
fiers (those that output a likelihood that an instance belongs to either class label)
correspond to curves in ROC space. Such curves are formulated by adjusting
the decision threshold to generate a series of points in ROC space. For example,
if the likelihoods of an unlabeled instance x k belonging to minority class and
majority class are 0 . 3and0 . 7, respectively, natural decision threshold d
0 . 5
would classify x k as a majority class example as 0 . 3 <d . However, d could also
be set to other values, for example, d
=
0 . 2. In this case, x k would be classified
as a minority class example as 0 . 3 >d . By tuning d from 0 to 1 with a small
step , for example, = 0 . 01, a series of pair-wise points (FP rate, TP rate)
could be created in ROC space. In order to assess the performance of different
classifiers in this case, one generally uses AUROC as an evaluation criterion;
it is defined as the area between the ROC curve and the horizontal axis (axis
representing FP rate).
In order to reflect the ROC curve characteristics for all random runs, the
vertical averaging approach [36] is adopted to plot the averaged ROC curves.
Implementation of the vertical averaging method is illustrated in Figure 7.5.
Assume one would like to average two ROC curves, l 1 and l 2 ; both are formed
by a series of points in the ROC space. The first step is to evenly divide the range
of FP rate into a set of intervals. Then at each interval, find the corresponding
TP rate values of each ROC curve and average them. In Figure 7.5, X 1 and
Y 1 are the points from l 1 and l 2 corresponding to the interval FP rate 1. By
averaging their TP rate values, the corresponding ROC point Z 1 on the averaged
ROC curve is obtained. However, there exist some ROC curves that do not have
corresponding points on certain intervals. In this case, one can use the linear
interpolation met hod to obtain the averaged ROC points. For instance, in Figure
7.5, the point X (corresponding to FP rate 2) is calculated on th e basis of the
linear interpolation of the two neighboring points X 2 and X 3 .Once X is obtained,
=
Search WWH ::




Custom Search