Spam Filters, Naive Bayes, and Wrangling - Doing Data Science

Databases Reference

In-Depth Information

θ j = n jc / n c

In other words, just what we had before. So what we've found is that

the maximal likelihood estimate recovers your result, as long as we

assume independence.

Now let's add a prior. For this discussion we can suppress the j from

the notation for clarity, but keep in mind that we are fixing the j th

word to work on. Denote by MAP the maximum a posteriori

likelihood :

θ MAP = ar gmax p θ D

This similarly answers the question: given the data I saw, which pa‐

rameter θ is the most likely?

Here we will apply the spirit of Bayes's Law to transform θ MAP to get

something that is, up to a constant, equivalent to p D θ · p θ . The

term p θ is referred to as the “prior,” and we have to make an as‐

sumption about its form to make this useful. If we make the assump‐

tion that the probability distribution of θ is of the form θ α

β , for

1− θ

some α and β , then we recover the Laplace Smoothed result.

Is That a Reasonable Assumption?

Recall that θ is the chance that a word is in spam if that word is in

some email. On the one hand, as long as both α > 0 and β > 0, this

distribution vanishes at both 0 and 1. This is reasonable: you want

very few words to be expected to never appear in spam or to always

appear in spam.

On the other hand, when α and β are large, the shape of the distri‐

bution is bunched in the middle, which reflects the prior that most

words are equally likely to appear in spam or outside spam. That

doesn't seem true either.

A compromise would have α and β be positive but small, like 1/5. That

would keep your spam filter from being too overzealous without hav‐

ing the wrong idea. Of course, you could relax this prior as you have

more and better data; in general, strong priors are only needed when

you don't have sufficient data.

Search WWH ::

Custom Search

Home