Databases Reference
In-Depth Information
“Idiot's Bayes - not so stupid after all?” (The whole paper is about
why it doesn't suck, which is related to redundancies in language.)
“Naive Bayes at Forty: The Independence Assumption in Infor‐
mation”
“Spam Filtering with Naive Bayes - Which Naive Bayes?”
Fancy It Up: Laplace Smoothing
Remember the θ j from the previous section? That referred to the
probability of seeing a given word (indexed by j ) in a spam email. If
you think about it, this is just a ratio of counts: θ j = n jc / n c , where n jc
denotes the number of times that word appears in a spam email and
n c denotes the number of times that word appears in any email.
Laplace Smoothing refers to the idea of replacing our straight-up es‐
timate of θ j with something a bit fancier:
θ jc = n jc + α / n c + β
We might fix α = 1 and β = 10, for example, to prevent the possibility
of getting 0 or 1 for a probability, which we saw earlier happening with
“viagra.” Does this seem totally ad hoc? Well, if we want to get fancy,
we can see this as equivalent to having a prior and performing a max‐
imal likelihood estimate. Let's get fancy! If we denote by ML the max‐
imal likelihood estimate, and by D the dataset, then we have:
θ ML = ar gmax θ p D θ
In other words, the vector of values θ j = n jc / n c is the answer to the
question: for what value of θ were the data D most probable? If we
assume independent trials again, as we did in our first attempt at Naive
Bayes, then we want to choose the θ j to separately maximize the fol‐
lowing quantity for each j :
log θ j n jc 1− θ j
n c n jc
If we take the derivative, and set it to zero, we get:
Search WWH ::




Custom Search