Databases Reference
In-Depth Information
θ j = n jc / n c
In other words, just what we had before. So what we've found is that
the maximal likelihood estimate recovers your result, as long as we
assume independence.
Now let's add a prior. For this discussion we can suppress the j from
the notation for clarity, but keep in mind that we are fixing the j th
word to work on. Denote by MAP the maximum a posteriori
likelihood :
θ MAP = ar gmax p θ D
This similarly answers the question: given the data I saw, which pa‐
rameter θ is the most likely?
Here we will apply the spirit of Bayes's Law to transform θ MAP to get
something that is, up to a constant, equivalent to p D θ · p θ . The
term p θ is referred to as the “prior,” and we have to make an as‐
sumption about its form to make this useful. If we make the assump‐
tion that the probability distribution of θ is of the form θ α
β , for
1− θ
some α and β , then we recover the Laplace Smoothed result.
Is That a Reasonable Assumption?
Recall that θ is the chance that a word is in spam if that word is in
some email. On the one hand, as long as both α > 0 and β > 0, this
distribution vanishes at both 0 and 1. This is reasonable: you want
very few words to be expected to never appear in spam or to always
appear in spam.
On the other hand, when α and β are large, the shape of the distri‐
bution is bunched in the middle, which reflects the prior that most
words are equally likely to appear in spam or outside spam. That
doesn't seem true either.
A compromise would have α and β be positive but small, like 1/5. That
would keep your spam filter from being too overzealous without hav‐
ing the wrong idea. Of course, you could relax this prior as you have
more and better data; in general, strong priors are only needed when
you don't have sufficient data.
 
Search WWH ::




Custom Search