Database Reference
In-Depth Information
Summary
This chapter focused on two classification methods: decision trees and naïve Bayes.
It discussed the theory behind these classifiers and used a bank marketing example
to explain how the methods work in practice. These classifiers along with logistic
regression (Chapter 6) are often used for the classification of data. As this topic has
discussed, each of these methods has its own advantages and disadvantages. How
does one pick the most suitable method for a given classification problem? Table 7.8
offers a list of things to consider when selecting a classifier.
Table 7.8 Choosing a Suitable Classifier
Concerns
Recommended
Method(s)
Output of the classification should include class
probabilities in addition to the class labels.
Logistic regression,
decision tree
Analysts want to gain an insight into how the variables
affect the model.
Logistic regression,
decision tree
The problem is high dimensional.
Naïve Bayes
Some of the input variables might be correlated.
Logistic regression,
decision tree
Some of the input variables might be irrelevant.
Decision tree, naïve
Bayes
The data contains categorical variables with a large number
of levels.
Decision tree, naïve
Bayes
The data contains mixed variable types.
Logistic regression,
decision tree
There is nonlinear data or discontinuities in the input
variables that would affect the output.
Decision tree
After the classification, one can use a few evaluation tools to measure how well a
classifier has performed or compare the performances of multiple classifiers. These
tools include confusion matrix, TPR, FPR, FNR, precision, recall, ROC curves, and
AUC.
Search WWH ::




Custom Search