Databases Reference
In-Depth Information
1.0
0.8
ROC
0.6
0.4
0.2
0.0
0
0.2
0.4
False positive rate ( FPR )
0.6
0.8
1.0
Figure 8.19 ROC curve for the data in Figure 8.18.
remaining nine tuples, which are all classified as negative, five actually are negative (thus,
TN D 5). The remaining four are all actually positive, thus, FN D 4. We can therefore
compute TPR D T P D 5 D 0.2, while FPR D 0. Thus, we have the point
.
0.2, 0
/
for the
ROC curve.
Next, threshold t is set to 0.8, the probability value for tuple 2, so this tuple is now
also considered positive, while tuples 3 through 10 are considered negative. The actual
class label of tuple 2 is positive, thus now TP D 2. The rest of the row can easily be
computed, resulting in the point
. Next, we examine the class label of tuple 3 and
let t be 0.7, the probability value returned by the classifier for that tuple. Thus, tuple 3 is
considered positive, yet its actual label is negative, and so it is a false positive. Thus, TP
stays the same and FP increments so that FP D 1. The rest of the values in the row can
also be easily computed, yielding the point
.
0.4, 0
/
. The resulting ROC graph, from
examining each tuple, is the jagged line shown in Figure 8.19.
There are many methods to obtain a curve out of these points, the most common
of which is to use a convex hull. The plot also shows a diagonal line where for every
true positive of such a model, we are just as likely to encounter a false positive. For
comparison, this line represents random guessing.
.
0.4, 0.2
/
Figure 8.20 shows the ROC curves of two classification models. The diagonal line
representing random guessing is also shown. Thus, the closer the ROC curve of a model
is to the diagonal line, the less accurate the model. If the model is really good, initially
we are more likely to encounter true positives as we move down the ranked list. Thus,
 
Search WWH ::




Custom Search