Advanced Analytical Theory and Methods: Classification - Data Science and Big Data Analytics

Database Reference

In-Depth Information

Summary

This chapter focused on two classification methods: decision trees and naïve Bayes.

It discussed the theory behind these classifiers and used a bank marketing example

to explain how the methods work in practice. These classifiers along with logistic

regression (Chapter 6) are often used for the classification of data. As this topic has

discussed, each of these methods has its own advantages and disadvantages. How

does one pick the most suitable method for a given classification problem? Table 7.8

offers a list of things to consider when selecting a classifier.

Table 7.8 Choosing a Suitable Classifier

Concerns

Recommended

Method(s)

Output of the classification should include class

probabilities in addition to the class labels.

Logistic regression,

decision tree

Analysts want to gain an insight into how the variables

affect the model.

Logistic regression,

decision tree

The problem is high dimensional.

Naïve Bayes

Some of the input variables might be correlated.

Logistic regression,

decision tree

Some of the input variables might be irrelevant.

Decision tree, naïve

Bayes

The data contains categorical variables with a large number

of levels.

Decision tree, naïve

Bayes

The data contains mixed variable types.

Logistic regression,

decision tree

There is nonlinear data or discontinuities in the input

variables that would affect the output.

Decision tree

After the classification, one can use a few evaluation tools to measure how well a

classifier has performed or compare the performances of multiple classifiers. These

tools include confusion matrix, TPR, FPR, FNR, precision, recall, ROC curves, and

AUC.

Search WWH ::

Custom Search

Home