Database Reference
In-Depth Information
Parallelizing processing with Incanter
In the upcoming chapters, many recipes will feature Incanter. One of its good features
is that it uses the Parallel Colt Java library ( http://sourceforge.net/projects/
parallelcolt/ ) to actually handle its processing. So when you use a lot of matrix,
statistical, or other functions, they're automatically executed on multiple threads.
For this, we'll revisit the Virginia housing-unit census data from the Managing program
complexity with STM recipe in Chapter 3 , Managing Complexity with Concurrent
Programming . This time, we'll it it to a linear regression.
Getting ready
We need to add Incanter to our list of dependencies in our Leiningen project.clj ile:
(defproject parallel-data "0.1.0"
:dependencies [[org.clojure/clojure "1.6.0"]
[incanter "1.5.5"]])
We also need to pull these libraries into our REPL or script:
(use '(incanter core datasets io optimize charts stats))
We'll use the data ile from the Managing program complexity with STM recipe in Chapter 3 ,
Managing Complexity with Concurrent Programming . We can bind that ilename to the
name data-file , just as we did in that recipe:
(def data-file "data/all_160_in_51.P35.csv")
How to do it…
For this recipe, we'll extract the data to be analyzed and perform a linear regression. We'll
then graph the data.
1.
First, we'll read in the data and pull the population and housing-unit columns into
their own matrices:
(def data (to-matrix
(sel (read-dataset data-file :header true)
:cols [:POP100 :HU100])))
2.
From this matrix, we can bind the population and the housing-unit data to their
own names:
(def population (sel data :cols 0))
(def housing-units (sel data :cols 1))
 
Search WWH ::




Custom Search