Databases Reference
In-Depth Information
•
“Idiot's Bayes - not so stupid after all?”
(The whole paper is about
why it doesn't suck, which is related to redundancies in language.)
Fancy It Up: Laplace Smoothing
Remember the
θ
j
from the previous section? That referred to the
probability of seeing a given word (indexed by
j
) in a spam email. If
you think about it, this is just a ratio of counts:
θ
j
=
n
jc
/
n
c
,
where
n
jc
denotes the number of times that word appears in a spam email and
n
c
denotes the number of times that word appears in any email.
Laplace Smoothing
refers to the idea of replacing our straight-up es‐
timate of
θ
j
with something a bit fancier:
θ
jc
=
n
jc
+
α
/
n
c
+
β
We might fix
α
= 1
and
β
= 10,
for example, to prevent the possibility
of getting 0 or 1 for a probability, which we saw earlier happening with
“viagra.” Does this seem totally ad hoc? Well, if we want to get fancy,
we can see this as equivalent to having a prior and performing a max‐
imal likelihood estimate. Let's get fancy! If we denote by
ML
the max‐
imal likelihood estimate, and by
D
the dataset, then we have:
θ
ML
=
ar gmax
θ
p D θ
In other words, the vector of values
θ
j
=
n
jc
/
n
c
is the answer to the
question: for what value of
θ
were the data D most probable? If we
assume independent trials again, as we did in our first attempt at Naive
Bayes, then we want to choose the
θ
j
to separately maximize the fol‐
lowing quantity for each
j
:
log θ
j
n
jc
1−
θ
j
n
c
−
n
jc
If we take the derivative, and set it to zero, we get: