Database Reference
In-Depth Information
Composing Cascalog queries
One of the best things about Cascalog queries is that they can be composed together.
Similar to composing functions, this can be a good way to build a complex process from
smaller, easy-to-understand parts.
In this recipe, we'll parse the Virginia census data we irst used in the
Managing program
complexity with STM
recipe in
Chapter 3
,
Managing Complexity with Concurrent Programming
.
You can download this data from
http://www.ericrochester.com/clj-data-
analysis/data/all_160_in_51.P35.csv
. We'll also use a new census dataile that
contains the race data. You can download it from
http://www.ericrochester.com/
Getting ready
Since we're reading CSV, we'll need to use the dependencies and imports from the
Parsing CSV iles with Cascalog
recipe.
We'll also use the
hfs-text-delim
function from that recipe and
->long
from the
Aggregating data with Cascalog
recipe.
Also, we'll need the data iles from
http://www.ericrochester.com/clj-data-
directory, as follows:
(def families-file "data/all_160_in_51.P35.csv")
(def race-file "data/all_160_in_51.P3.csv")
How to do it…
We'll read these datasets and convert some of the ields in each to integers. Then we'll join
the two together and select only a few of the ields.
1. We'll deine a query that reads the
families
data ile and converts the integer ields
to numbers:
(def family-data
(<- [?GEOID ?SUMLEV ?STATE
?NAME ?POP100 ?HU100 ?P035001]
((hfs-text-delim families-file