Advanced Analytical Theory and Methods: Regression - Data Science and Big Data Analytics

Database Reference

In-Depth Information

Then the following R code indicates that the corresponding TPR and FPR values

are 0.91 and 0.29, respectively. Thus, 91% of the customers who will churn are

properly identified, but at a cost of misclassifying 29% of the customers who will

remain.

i <- which(round(alpha,2) == .15)

paste("Threshold=" , (alpha[i]) , " TPR=" , tpr[i] , "

FPR=" , fpr[i])

[1] "Threshold= 0.1543 TPR= 0.9116 FPR= 0.2869"

[2] "Threshold= 0.1518 TPR= 0.9122 FPR= 0.2875"

[3] "Threshold= 0.1479 TPR= 0.9145 FPR= 0.2942"

[4] "Threshold= 0.1455 TPR= 0.9174 FPR= 0.2981"

The ROC curve is useful for evaluating other classifiers and will be utilized again in

Chapter 7, “Advanced Analytical Theory and Methods: Classification.”

Histogram of the Probabilities

It can be useful to visualize the observed responses against the estimated

probabilities provided by the logistic regression. Figure 6.17 provides overlaying

histograms for the customers who churned and for the customers who remained

as customers. With a proper fitting logistic model, the customers who remained

tend to have a low probability of churning. Conversely, the customers who churned

have a high probability of churning again. This histogram plot helps visualize the

number of items to be properly classified or misclassified. In the Churn example,

an ideal histogram plot would have the remaining customers grouped at the left

side of the plot, the customers who churned at the right side of the plot, and no

overlap of these two groups.

Search WWH ::

Custom Search

Home