Classification: Basic Concepts - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

is the posterior probability , or a posteriori probability , of H conditioned

on X . For example, suppose our world of data tuples is confined to customers described

by the attributes age and income , respectively, and that X is a 35-year-old customer with

an income of $40,000. Suppose that H is the hypothesis that our customer will buy a

computer. Then P

H j X

reflects the probability that customer X will buy a computer

given that we know the customer's age and income.

In contrast, P

H j X

is the prior probability , or a priori probability, of H . For our exam-

ple, this is the probability that any given customer will buy a computer, regardless of age,

income, or any other information, for that matter. The posterior probability, P

is based on more information (e.g., customer information) than the prior probability,

H j X

, which is independent of X .

Similarly, P

is the posterior probability of X conditioned on H . That is, it is the

probability that a customer, X , is 35 years old and earns $40,000, given that we know the

customer will buy a computer.

X j H

is the prior probability of X . Using our example, it is the probability that a

person from our set of customers is 35 years old and earns $40,000.

“How are these probabilities estimated?” P

may be estimated

from the given data, as we shall see next. Bayes' theorem is useful in that it provides

a way of calculating the posterior probability, P

, P

X j H

, and P

H j X

, from P

, P

X j H

, and P

Bayes' theorem is

X j H

H j X

(8.10)

Now that we have that out of the way, in the next section, we will look at how Bayes'

theorem is used in the naıve Bayesian classifier.

8.3.2 Naıve Bayesian Classification

The naıve Bayesian classifier, or simple Bayesian classifier, works as follows:

1. Let D be a training set of tuples and their associated class labels. As usual, each tuple

is represented by an n -dimensional attribute vector, X D.

x 1 , x 2 ,

:::

, x n /

, depicting n

measurements made on the tuple from n attributes, respectively, A 1 , A 2 ,

:::

, A n .

2. Suppose that there are m classes, C 1 , C 2 ,

, C m . Given a tuple, X , the classifier will

predict that X belongs to the class having the highest posterior probability, condi-

tioned on X . That is, the naıve Bayesian classifier predicts that tuple X belongs to the

class C i if and only if

:::

C i j X

C j j X

for 1 j m , j 6D i .

Thus, we maximize P

is maximized is called

the maximum posteriori hypothesis . By Bayes' theorem (Eq. 8.10),

C i j X

. The class C i for which P

C i j X

X j C i /

C i /

C i j X

(8.11)

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home