Database Reference
In-Depth Information
2.
Now let's try something more complicated. We'll ind out how many lights leave each
airport. To do this, we irst need to deine a mapping function in order to convert the
values in a column to long s, using defmapfn . We'll use this to convert the lights
column to numbers, and we'll use the c/sum function to aggregate those by airport:
user=> (defmapfn ->long
"Converts a value to a long."
[value]
(Long/parseLong value))
user=> (?<- (stdout)
[?origin_airport ?count]
((hfs-text-delim "data/16285/flights_with_colnames.csv"
:has-header true)
?origin_airport _ _ ?flights _)
(:distinct true)
(->long ?flights :> ?f)
(c/sum ?f :> ?count))
RESULTS
-----------------------
1B11B1 1
ABE 197049
ABI 50043
ABQ 758168
ABR 30832
ABY 34298
This query is very similar. We use the map function to prepare the column that
we want aggregated. We also include an aggregator predicate. Next, in the
output bindings, we include both the value that we want the data grouped on
( ?origin_airport ) and the aggregated binding ( ?count ).
It's this simple. Cascalog takes care of the rest.
There's more
Cascalog provides a number of other aggregator functions as well. Some functions that you'll
want to use regularly include count , max , min , sum , and avg . See the documentation for the
build-in operations ( https://github.com/nathanmarz/cascalog/wiki/Built-in-
operations ) for a more complete list.
We'll also talk more about defmapfn in the next recipe, Deining new Cascalog operators .
 
Search WWH ::




Custom Search