Advanced Analytical Theory and Methods: Regression - Data Science and Big Data Analytics

Database Reference

In-Depth Information

type="l" )

axis(side=4)

mtext(side=4, line=3, "False positive rate")

text(0.18,0.18,"FPR")

text(0.58,0.58,"TPR")

Figure 6.16 The effect of the threshold value in the churn example

For a threshold value of 0, every item is classified as a positive outcome. Thus, the

TPR value is 1. However, all the negatives are also classified as a positive, and the

FPR value is also 1. As the threshold value increases, more and more negative class

labels are assigned. Thus, the FPR and TPR values decrease. When the threshold

reaches 1, no positive labels are assigned, and the FPR and TPR values are both 0.

For the purposes of a classifier, a commonly used threshold value is 0.5. A positive

label is assigned for any probability of 0.5 or greater. Otherwise, a negative label is

assigned. As the following R code illustrates, in the analysis of the Churn dataset,

the 0.5 threshold corresponds to a TPR value of 0.56 and a FPR value of 0.08.

i <- which(round(alpha,2) == .5)

paste("Threshold=" , (alpha[i]) , " TPR=" , tpr[i] , "

FPR=" , fpr[i])

[1] "Threshold= 0.5004 TPR= 0.5571 FPR= 0.0793"

Thus, 56% of customers who will churn are properly classified with the Churn

label, and 8% of the customers who will remain as customers are improperly

labeled as Churn . If identifying only 56% of the churners is not acceptable, then

the threshold could be lowered. For example, suppose it was decided to classify

with a Churn label any customer with a probability of churning greater than 0.15.

Search WWH ::

Custom Search

Home