Database Reference
In-Depth Information
data efficiently. Related research [13] shows that the naive Bayes classifier in many
cases is competitive with other learning algorithms, including decision trees and
neural networks. In some cases naïve Bayes even outperforms other methods.
Unlike logistic regression, the naïve Bayes classifier can handle categorical
variables with many levels. Recall that decision trees can handle categorical
variables as well, but too many levels may result in a deep tree. The naïve Bayes
classifier overall performs better than decision trees on categorical values with
many levels. Compared to decision trees, naïve Bayes is more resistant to
overfitting, especially with the presence of a smoothing technique.
Despite the benefits of naïve Bayes, it also comes with a few disadvantages. Naïve
Bayes assumes the variables in the data are conditionally independent. Therefore,
it is sensitive to correlated variables because the algorithm may double count the
effects. As an example, assume that people with low income and low credit tend
to default. If the task is to score “default” based on both income and credit as two
separate attributes, naïve Bayes would experience the double-counting effect on
the default outcome, thus reducing the accuracy of the prediction.
Although probabilities are provided as part of the output for the prediction, naïve
Bayes classifiers in general are not very reliable for probability estimation and
should be used only for assigning class labels. Naïve Bayes in its simple form is
used only with categorical variables. Any continuous variables should be converted
into a categorical variable with the process known as discretization, as shown
earlier. In common statistical software packages, however, naïve Bayes is
implemented in a way that enables it to handle continuous variables as well.
7.2.5 Naïve Bayes in R
This section explores two methods of using the naïve Bayes classifier in R. The first
method is to build from scratch by manually computing the probability scores, and
the second method is to use the naiveBayes function from the e1071 package.
The examples show how to use naïve Bayes to predict whether employees would
enroll in an onsite educational program.
In R, first set up the working directory and initialize the packages.
setwd("c:/")
install.packages("e1071") # install package e1071
library(e1071) # load the library
Search WWH ::




Custom Search