Classification: Basic Concepts - Data Mining: Concepts and Techniques - page 354

Databases Reference

In-Depth Information

P ( student D yes j buys computer D no )

D 1

=

5 D 0.200

P ( credit rating D fair j buys computer D yes )D 6

=

9 D 0.667

P ( credit rating D fair j buys computer D no ) D 2

=

5 D 0.400

Using these probabilities, we obtain

P

.

X j buys computer D yes

/D P ( age D youth j buys computer D yes )

P ( income D medium j buys computer D yes )

P ( student D yes j buys computer D yes )

P ( credit rating D fair j buys computer D yes )

D0.2220.4440.6670.667 D 0.044.

Similarly,

P

.

X j buys computer D no

/D 0.6000.4000.2000.400 D 0.019.

To find the class, C i , that maximizes P

.

X j C i /

P

.

C i /

, we compute

P

.

X j buys computer D yes

/

P

.

buys computer D yes

/D 0.0440.643 D 0.028

P

.

X j buys computer D no

/

P

.

buys computer D no

/D 0.0190.357 D 0.007

Therefore, the naıve Bayesian classifier predicts buys computer D yes for tuple X .

“What if I encounter probability values of zero?” Recall that in Eq. (8.12), we esti-

mate P

, based

on the assumption of class-conditional independence. These probabilities can be esti-

mated from the training tuples (step 4). We need to compute P

.

X j C i /

as the product of the probabilities P

.

x 1 j C i /

, P

.

x 2 j C i /

,

:::

, P

.

x n j C i /

.

X j C i /

for each class ( i D

1, 2,

is the maximum (step 5). Let's

consider this calculation. For each attribute-value pair (i.e., A k D x k , for k D 1, 2,

:::

, m ) to find the class C i for which P

.

X j C i /

P

.

C i /

, n )

in tuple X , we need to count the number of tuples having that attribute-value pair, per

class (i.e., per C i , for i D 1,

:::

, namely

buys computer D yes and buys computer D no . Therefore, for the attribute-value pair

student D yes of X , say, we need two counts—the number of customers who are students

and for which buys computer D yes (which contributes to P

:::

, m ). In Example 8.4, we have two classes

.

m D 2

/

//

and the number of customers who are students and for which buys computer D no

(which contributes to P

.

X j buys computer D yes

.

But what if, say, there are no training tuples representing students for the class

buys computer D no , resulting in P

.

X j buys computer D no

//

/D 0? In other

words, what happens if we should end up with a probability value of zero for some

P

.

student D yes j buys computer D no

.

x k j C i /

? Plugging this zero value into Eq. (8.12) would return a zero probability for

P

, even though, without the zero probability, we may have ended up with a high

probability, suggesting that X belonged to class C i ! A zero probability cancels the effects

of all the other (posteriori) probabilities (on C i ) involved in the product.

There is a simple trick to avoid this problem. We can assume that our training data-

base, D , is so large that adding one to each count that we need would only make a

negligible difference in the estimated probability value, yet would conveniently avoid the

.

X j C i /

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home