Databases Reference
In-Depth Information
Naive Bayes
So are we at a loss now that two methods we're familiar with, linear
regression and k-NN, won't work for the spam filter problem? No!
Naive Bayes is another classification method at our disposal that scales
well and has nice intuitive appeal.
Bayes Law
Let's start with an even simpler example than the spam filter to get a
feel for how Naive Bayes works. Let's say we're testing for a rare disease,
where 1% of the population is infected. We have a highly sensitive and
specific test, which is not quite perfect:
• 99% of sick patients test positive.
• 99% of healthy patients test negative.
Given that a patient tests positive, what is the probability that the pa‐
tient is actually sick?
A naive approach to answering this question is this: Imagine we have
100 × 100 = 10,000 perfectly representative people. That would mean
that 100 are sick, and 9,900 are healthy. Moreover, after giving all of
them the test we'd get 99 sick people testing sick, but 99 healthy people
testing sick as well. If you test positive, in other words, you're equally
likely to be healthy or sick; the answer is 50%. A tree diagram of this
approach is shown in Figure 4-3 .
Figure 4-3. Tree diagram to build intuition
Let's do it again using fancy notation so we'll feel smart.
 
Search WWH ::




Custom Search