Database Reference
In-Depth Information
Composing Cascalog queries
One of the best things about Cascalog queries is that they can be composed together.
Similar to composing functions, this can be a good way to build a complex process from
smaller, easy-to-understand parts.
In this recipe, we'll parse the Virginia census data we irst used in the Managing program
complexity with STM recipe in Chapter 3 , Managing Complexity with Concurrent Programming .
You can download this data from http://www.ericrochester.com/clj-data-
analysis/data/all_160_in_51.P35.csv . We'll also use a new census dataile that
contains the race data. You can download it from http://www.ericrochester.com/
clj-data-analysis/data/all_160_in_51.P3.csv .
Getting ready
Since we're reading CSV, we'll need to use the dependencies and imports from the
Parsing CSV iles with Cascalog recipe.
We'll also use the hfs-text-delim function from that recipe and ->long from the
Aggregating data with Cascalog recipe.
Also, we'll need the data iles from http://www.ericrochester.com/clj-data-
analysis/data/all_160_in_51.P35.csv and http://www.ericrochester.com/
clj-data-analysis/data/all_160_in_51.P3.csv . We'll put them into the data
directory, as follows:
(def families-file "data/all_160_in_51.P35.csv")
(def race-file "data/all_160_in_51.P3.csv")
How to do it…
We'll read these datasets and convert some of the ields in each to integers. Then we'll join
the two together and select only a few of the ields.
1. We'll deine a query that reads the families data ile and converts the integer ields
to numbers:
(def family-data
(<- [?GEOID ?SUMLEV ?STATE
?NAME ?POP100 ?HU100 ?P035001]
((hfs-text-delim families-file
 
Search WWH ::




Custom Search