Database Reference
In-Depth Information
How it works…
The real test of this system is how well it has modeled the population. We can ind that out
easily by dividing the total African-American population by the total population:
user=> (/ (i/sum (i/sel census-race :cols :black))
(i/sum (i/sel census-race :cols :total)))
0.21676297226785196
user=> (- *1 (s/mean (:black theta-params)))
0.0375200896519331
So in fact, the results are close, but not very.
Let's reiterate what we we've learned about Bayesian analysis by looking at what this process
has done. It started out with a standard distribution (the Dirichlet distribution), and based
upon input data from the sample, updated its estimate of the probability distribution of the
population that the sample was drawn from.
Often Bayesian methods provide better results than alternative methods, and they're a
powerful addition to any data worker's tool set.
There's more...
Incanter includes functions that sample from a number of Bayesian distributions, found at
http://liebke.github.com/incanter/bayes-api.html .
On Bayesian approaches to data analysis, and life in general, see http://bayes.
bgsu.edu/nsf_web/tutorial/a_brief_tutorial.htm and http://
dartthrowingchimp.wordpress.com/2012/12/31/dr-bayes-or-how-i-learned-
to-stop-worrying-and-love-updating/ .
Finding data errors with Benford's law
Benford's law is a curious observation about the distribution of the irst digits of numbers
in many naturally occurring datasets. In sequences that conform to Benford's law, the irst
digit will be 1 about a third of the time, and higher digits will occur progressively less often.
However, manually constructed data rarely looks like this. Because of that, lack of a Benford's
Law distribution is evidence that a dataset is not manually constructed.
For example, this has been shown to hold true in inancial data, and investigators leverage
this for fraud detection. The US Internal Revenue Service reportedly uses it for identifying
potential tax fraud, and inancial auditors also use it.
 
Search WWH ::




Custom Search