Databases Reference
In-Depth Information
P ( student D yes j buys computer D no )
D 1
=
5 D 0.200
P ( credit rating D fair j buys computer D yes )D 6
=
9 D 0.667
P ( credit rating D fair j buys computer D no ) D 2
=
5 D 0.400
Using these probabilities, we obtain
P
.
X j buys computer D yes
/D P ( age D youth j buys computer D yes )
P ( income D medium j buys computer D yes )
P ( student D yes j buys computer D yes )
P ( credit rating D fair j buys computer D yes )
D0.2220.4440.6670.667 D 0.044.
Similarly,
P
.
X j buys computer D no
/D 0.6000.4000.2000.400 D 0.019.
To find the class, C i , that maximizes P
.
X j C i /
P
.
C i /
, we compute
P
.
X j buys computer D yes
/
P
.
buys computer D yes
/D 0.0440.643 D 0.028
P
.
X j buys computer D no
/
P
.
buys computer D no
/D 0.0190.357 D 0.007
Therefore, the naıve Bayesian classifier predicts buys computer D yes for tuple X .
“What if I encounter probability values of zero?” Recall that in Eq. (8.12), we esti-
mate P
, based
on the assumption of class-conditional independence. These probabilities can be esti-
mated from the training tuples (step 4). We need to compute P
.
X j C i /
as the product of the probabilities P
.
x 1 j C i /
, P
.
x 2 j C i /
,
:::
, P
.
x n j C i /
.
X j C i /
for each class ( i D
1, 2,
is the maximum (step 5). Let's
consider this calculation. For each attribute-value pair (i.e., A k D x k , for k D 1, 2,
:::
, m ) to find the class C i for which P
.
X j C i /
P
.
C i /
, n )
in tuple X , we need to count the number of tuples having that attribute-value pair, per
class (i.e., per C i , for i D 1,
:::
, namely
buys computer D yes and buys computer D no . Therefore, for the attribute-value pair
student D yes of X , say, we need two counts—the number of customers who are students
and for which buys computer D yes (which contributes to P
:::
, m ). In Example 8.4, we have two classes
.
m D 2
/
//
and the number of customers who are students and for which buys computer D no
(which contributes to P
.
X j buys computer D yes
.
But what if, say, there are no training tuples representing students for the class
buys computer D no , resulting in P
.
X j buys computer D no
//
/D 0? In other
words, what happens if we should end up with a probability value of zero for some
P
.
student D yes j buys computer D no
.
x k j C i /
? Plugging this zero value into Eq. (8.12) would return a zero probability for
P
, even though, without the zero probability, we may have ended up with a high
probability, suggesting that X belonged to class C i ! A zero probability cancels the effects
of all the other (posteriori) probabilities (on C i ) involved in the product.
There is a simple trick to avoid this problem. We can assume that our training data-
base, D , is so large that adding one to each count that we need would only make a
negligible difference in the estimated probability value, yet would conveniently avoid the
.
X j C i /
 
Search WWH ::




Custom Search