Database Reference
In-Depth Information
How to do it…
For this recipe, we'll see how to it a formula to a set of data points. In this case, we'll look
for a relationship between the speed limit and the number of fatal accidents that occur over
a year:
1. First, we need to load the data from the tab-delimited iles:
(def data
(incanter.io/read-dataset data-file
:header true
:delim \tab))
2.
From this data, we'll use the $rollup function to calculate the number of fatalities
per speed limit, and then ilter out any invalid speed limits (empty values). We'll then
sort it by speed limit and create a new dataset. That seems like a mouthful, but it's
really quite simple:
(def fatalities
(->> data
(i/$rollup :count :Obs. :spdlim)
(i/$where {:spdlim {:$ne "."}})
(i/$where {:spdlim {:$ne 0}})
(i/$order :spdlim :asc)
(i/to-list)
(i/dataset [:speed-limit :fatalities])))
3.
We'll now pull out the columns to make them easier to refer to later:
(def speed-limit (i/sel fatalities :cols :speed-limit))
(def fatality-count (i/sel fatalities :cols :fatalities))
4. The irst dificult part of non-linear models is that the general shape of the formula
isn't predetermined. We have to igure out what type of formula might best it the
data. To do that, let's graph it and try to think of a class of functions that roughly
matches the shape of the data:
(def chart
(doto
(c/scatter-plot speed-limit fatality-count
:title
"Fatalities by Speed Limit (2010)"
:x-label "Speed Limit"
:y-label "Fatality Count"
:legend true)
i/view))
 
Search WWH ::




Custom Search