Advanced Analytical Theory and Methods: Classification - Data Science and Big Data Analytics

Database Reference

In-Depth Information

data efficiently. Related research [13] shows that the naive Bayes classifier in many

cases is competitive with other learning algorithms, including decision trees and

neural networks. In some cases naïve Bayes even outperforms other methods.

Unlike logistic regression, the naïve Bayes classifier can handle categorical

variables with many levels. Recall that decision trees can handle categorical

variables as well, but too many levels may result in a deep tree. The naïve Bayes

classifier overall performs better than decision trees on categorical values with

many levels. Compared to decision trees, naïve Bayes is more resistant to

overfitting, especially with the presence of a smoothing technique.

Despite the benefits of naïve Bayes, it also comes with a few disadvantages. Naïve

Bayes assumes the variables in the data are conditionally independent. Therefore,

it is sensitive to correlated variables because the algorithm may double count the

effects. As an example, assume that people with low income and low credit tend

to default. If the task is to score “default” based on both income and credit as two

separate attributes, naïve Bayes would experience the double-counting effect on

the default outcome, thus reducing the accuracy of the prediction.

Although probabilities are provided as part of the output for the prediction, naïve

Bayes classifiers in general are not very reliable for probability estimation and

should be used only for assigning class labels. Naïve Bayes in its simple form is

used only with categorical variables. Any continuous variables should be converted

into a categorical variable with the process known as discretization, as shown

earlier. In common statistical software packages, however, naïve Bayes is

implemented in a way that enables it to handle continuous variables as well.

7.2.5 Naïve Bayes in R

This section explores two methods of using the naïve Bayes classifier in R. The first

method is to build from scratch by manually computing the probability scores, and

the second method is to use the naiveBayes function from the e1071 package.

The examples show how to use naïve Bayes to predict whether employees would

enroll in an onsite educational program.

In R, first set up the working directory and initialize the packages.

setwd("c:/")

install.packages("e1071") # install package e1071

library(e1071) # load the library

Search WWH ::

Custom Search

Home