Advanced Analytical Theory and Methods: Regression - Data Science and Big Data Analytics

Database Reference

In-Depth Information

In general, this log-likelihood ratio test is particularly useful for forward and

backward step-wise methods to add variables to or remove them from the

proposed logistic regression model.

Receiver Operating Characteristic (ROC) Curve

Logistic regression is often used as a classifier to assign class labels to a person,

item, or transaction based on the predicted probability provided by the model. In

the Churn example, a customer can be classified with the label called Churn if the

logistic model predicts a high probability that the customer will churn. Otherwise,

a Remain label is assigned to the customer. Commonly, 0.5 is used as the default

probability threshold to distinguish between any two class labels. However, any

threshold value can be used depending on the preference to avoid false positives

(for example, to predict Churn when actually the customer will Remain ) or false

negatives (for example, to predict Remain when the customer will actually Churn ).

In general, for two class labels, C and ¬C, where “¬C” denotes “not C,” some

working definitions and formulas follow:

• True Positive: predict C, when actually C

• True Negative: predict ¬C, when actually ¬C

• False Positive: predict C, when actually ¬C

• False Negative: predict ¬C, when actually C

6.16 False Positive Rate (FPR)

6.17 True Positive : Rate (TPR)

The plot of the True Positive Rate (TPR) against the False Positive Rate (FPR)

is known as the Receiver Operating Characteristic (ROC) curve. Using the

ROCR package, the following R commands generate the ROC curve for the Churn

example:

library(ROCR)

pred = predict(Churn_logistic3, type="response")

predObj = prediction(pred, churn_input$Churned )

rocObj = performance(predObj, measure="tpr",

Search WWH ::

Custom Search

Home