Database Reference
In-Depth Information
7.2 Naïve Bayes
Naïve Bayes is a probabilistic classification method based on Bayes' theorem (or
Bayes' law) with a few tweaks. Bayes' theorem gives the relationship between the
probabilities of two events and their conditional probabilities. Bayes' law is named
after the English mathematician Thomas Bayes.
A naïve Bayes classifier assumes that the presence or absence of a particular feature
of a class is unrelated to the presence or absence of other features. For example,
an object can be classified based on its attributes such as shape, color, and weight.
A reasonable classification for an object that is spherical, yellow, and less than 60
grams in weight may be a tennis ball. Even if these features depend on each other or
upon the existence of the other features, a naïve Bayes classifier considers all these
properties to contribute independently to the probability that the object is a tennis
ball.
The input variables are generally categorical, but variations of the algorithm can
accept continuous variables. There are also ways to convert continuous variables
into categorical ones. This process is often referred to as the discretization of
continuous variables . In the tennis ball example, a continuous variable such as
weight can be grouped into intervals to be converted into a categorical variable. For
an attribute such as income , the attribute can be converted into categorical values
as shown below.
Low Income: income < $10,000
Working Class: $10,000 ≤ income < $50,000
Middle Class: $50,000 ≤ income < $1,000,000
Upper Class: income ≥ $1,000,000
The output typically includes a class label and its corresponding probability score.
The probability score is not the true probability of the class label, but it's
proportional to the true probability. As shown later in the chapter, in most
implementations, the output includes the log probability for the class, and class
labels are assigned based on the highest values.
Because naïve Bayes classifiers are easy to implement and can execute efficiently
even without prior knowledge of the data, they are among the most popular
algorithms for classifying text documents. Spam filtering is a classic use case of
naïve Bayes text classification. Bayesian spam filtering has become a popular
Search WWH ::




Custom Search