Database Reference
In-Depth Information
difficult to achieve the top-left corner. But a better classifier should be closer to the
top left, separating it from other classifiers that are closer to the diagonal line.
Related to the ROC curve is the area under the curve (AUC). The AUC is
calculated by measuring the area under the ROC curve. Higher AUC scores mean
the classifier performs better. The score can range from 0.5 (for the diagonal line
TPR=FPR) to 1.0 (with ROC passing through the top-left corner).
In the bank marketing example, the training set includes 2,000 instances. An
additional 100 instances are included as the testing set. Figure 7.10 shows a ROC
curve of the naïve Bayes classifier built on the training set of 2,000 instances and
tested on the testing set of 100 instances. The figure is generated by the following
R script. The ROCR package is required for plotting the ROC curve. The 2,000
instances are in a data frame called banktrain , and the additional 100 instances
are in a data frame called banktest .
library(ROCR)
# training set
banktrain <-
read.table("bank-sample.csv",header=TRUE,sep=",")
# drop a few columns
drops <- c("balance", "day", "campaign", "pdays",
"previous", "month")
banktrain <- banktrain [,!(names(banktrain) %in% drops)]
# testing set
banktest <-
read.table("bank-sample-test.csv",header=TRUE,sep=",")
banktest <- banktest [,!(names(banktest) %in% drops)]
# build the naïve Bayes classifier
nb_model <- naiveBayes(subscribed˜.,
data=banktrain)
# perform on the testing set
nb_prediction <- predict(nb_model,
# remove column "subscribed"
banktest[,-ncol(banktest)],
type='raw')
score <- nb_prediction[, c("yes")]
actual_class <- banktest$subscribed == 'yes'
pred <- prediction(score, actual_class)
perf <- performance(pred, "tpr", "fpr")
Search WWH ::




Custom Search