Databases Reference
In-Depth Information
P
(
student
D
yes
j
buys computer
D
no
)
D 1
=
5 D 0.200
P
(
credit rating
D
fair
j
buys computer
D
yes
)D 6
=
9 D 0.667
P
(
credit rating
D
fair
j
buys computer
D
no
) D 2
=
5 D 0.400
Using these probabilities, we obtain
P
.
X
j
buys computer
D
yes
/D
P
(
age
D
youth
j
buys computer
D
yes
)
P
(
income
D
medium
j
buys computer
D
yes
)
P
(
student
D
yes
j
buys computer
D
yes
)
P
(
credit rating
D
fair
j
buys computer
D
yes
)
D0.2220.4440.6670.667 D 0.044.
Similarly,
P
.
X
j
buys computer
D
no
/D 0.6000.4000.2000.400 D 0.019.
To find the class,
C
i
, that maximizes
P
.
X
j
C
i
/
P
.
C
i
/
, we compute
P
.
X
j
buys computer
D
yes
/
P
.
buys computer
D
yes
/D 0.0440.643 D 0.028
P
.
X
j
buys computer
D
no
/
P
.
buys computer
D
no
/D 0.0190.357 D 0.007
Therefore, the naıve Bayesian classifier predicts
buys computer
D
yes
for tuple
X
.
“What if I encounter probability values of zero?”
Recall that in Eq. (8.12), we esti-
mate
P
, based
on the assumption of class-conditional independence. These probabilities can be esti-
mated from the training tuples (step 4). We need to compute
P
.
X
j
C
i
/
as the product of the probabilities
P
.
x
1
j
C
i
/
,
P
.
x
2
j
C
i
/
,
:::
,
P
.
x
n
j
C
i
/
.
X
j
C
i
/
for
each
class (
i
D
1, 2,
is the maximum (step 5). Let's
consider this calculation. For each attribute-value pair (i.e.,
A
k
D
x
k
, for
k
D 1, 2,
:::
,
m
) to find the class
C
i
for which
P
.
X
j
C
i
/
P
.
C
i
/
,
n
)
in tuple
X
, we need to count the number of tuples having that attribute-value pair, per
class (i.e., per
C
i
, for
i
D 1,
:::
, namely
buys computer
D
yes
and
buys computer
D
no
. Therefore, for the attribute-value pair
student
D
yes
of
X
, say, we need two counts—the number of customers who are students
and for which
buys computer
D
yes
(which contributes to
P
:::
,
m
). In Example 8.4, we have two classes
.
m
D 2
/
//
and the number of customers who are students and for which
buys computer
D
no
(which contributes to
P
.
X
j
buys computer
D
yes
.
But what if, say, there are no training tuples representing students for the class
buys computer
D
no
, resulting in
P
.
X
j
buys computer
D
no
//
/D 0? In other
words, what happens if we should end up with a probability value of zero for some
P
.
student
D
yes
j
buys computer
D
no
.
x
k
j
C
i
/
? Plugging this zero value into Eq. (8.12) would return a zero probability for
P
, even though, without the zero probability, we may have ended up with a high
probability, suggesting that
X
belonged to class
C
i
! A zero probability cancels the effects
of all the other (posteriori) probabilities (on
C
i
) involved in the product.
There is a simple trick to avoid this problem. We can assume that our training data-
base,
D
, is so large that adding one to each count that we need would only make a
negligible difference in the estimated probability value, yet would conveniently avoid the
.
X
j
C
i
/