Advanced Analytical Theory and Methods: Classification - Data Science and Big Data Analytics

Database Reference

In-Depth Information

7.2 Naïve Bayes

Naïve Bayes is a probabilistic classification method based on Bayes' theorem (or

Bayes' law) with a few tweaks. Bayes' theorem gives the relationship between the

probabilities of two events and their conditional probabilities. Bayes' law is named

after the English mathematician Thomas Bayes.

A naïve Bayes classifier assumes that the presence or absence of a particular feature

of a class is unrelated to the presence or absence of other features. For example,

an object can be classified based on its attributes such as shape, color, and weight.

A reasonable classification for an object that is spherical, yellow, and less than 60

grams in weight may be a tennis ball. Even if these features depend on each other or

upon the existence of the other features, a naïve Bayes classifier considers all these

properties to contribute independently to the probability that the object is a tennis

ball.

The input variables are generally categorical, but variations of the algorithm can

accept continuous variables. There are also ways to convert continuous variables

into categorical ones. This process is often referred to as the discretization of

continuous variables . In the tennis ball example, a continuous variable such as

weight can be grouped into intervals to be converted into a categorical variable. For

an attribute such as income , the attribute can be converted into categorical values

as shown below.

• Low Income: income < $10,000

• Working Class: $10,000 ≤ income < $50,000

• Middle Class: $50,000 ≤ income < $1,000,000

• Upper Class: income ≥ $1,000,000

The output typically includes a class label and its corresponding probability score.

The probability score is not the true probability of the class label, but it's

proportional to the true probability. As shown later in the chapter, in most

implementations, the output includes the log probability for the class, and class

labels are assigned based on the highest values.

Because naïve Bayes classifiers are easy to implement and can execute efficiently

even without prior knowledge of the data, they are among the most popular

algorithms for classifying text documents. Spam filtering is a classic use case of

naïve Bayes text classification. Bayesian spam filtering has become a popular

Search WWH ::

Custom Search

Home