Database Reference
In-Depth Information
Some of the recipes in this chapter will involve generating simple summary statistics. Some
will involve further messaging our data to make trends and relationships more clear. We'll
then look at different ways to model the relationships in our data. Finally, we'll look at
Benford's law, a curious observation about the behavior of naturally occurring sequences of
numbers, which we can leverage to discover problems with our data.
Generating summary statistics with $rollup
One of the basic ways of getting a grip on a dataset is to look at some summary statistics:
measures of centrality and variance, such as mean and standard deviation. These provide
useful insights into our data, help us know what questions to ask next, and know how best
to proceed.
Getting ready
First, we'll need to make sure Incanter is listed in the dependencies of our Leiningen
project.clj ile:
(defproject statim "0.1.0"
:dependencies [[org.clojure/clojure "1.6.0"]
[incanter "1.5.5"]])
And we'll need to require these libraries in our script or REPL:
(require '[incanter.core :as i]
'incanter.io
'[incanter.stats :as s])
Finally, we'll use the dataset of census race data that we compiled for the Grouping data with
$group-by recipe in Chapter 6 , Working with Incanter Datasets . We'll bind the ile name to the
name data-file . You can download this from http://www.ericrochester.com/clj-
data-analysis/data/all_160.P3.csv :
(def data-file "data/all_160.P3.csv")
How to do it…
To generate summary statistics in Incanter, we'll use the $rollup function.
First, we'll load the dataset and bind it to the name census :
(def census (incanter.io/read-dataset data-file :header true))
 
Search WWH ::




Custom Search