Database Reference
In-Depth Information
[(inc n) (+ total (count string))])
([[n total]] [(float (/ total n))]))
(?<- (stdout)
[?dr ?companion-chars]
(doctor ?c ?dr)
(full-name ?c ?name)
(mean-count ?name :> ?companion-chars))
Creating parallel aggregate operators
Parallel aggregate operators are the most restricted, but they also give the best performance.
Unlike the rest, they can be run in the map phase of the computation. These aggregators are
deined by two functions. One function is called on each row, and one is called to combine the
results of calling the irst function on two rows.
This example returns the average length of the name of each doctor's companions:
1. First, you have to deine the aggregator functions as named functions. Cascalog
serializes them as names, so you can't use anonymous functions:
(defn mean-init [x] [1 (count x)])
(defn mean-step [n1 t1 n2 t2] [(+ n1 n2) (+ t1 t2)])
2. Then use these variables to deine the parallel aggregator:
(defparallelagg
mean-count-p
:init-var #'mean-init
:combine-var #'mean-step)
3.
The aggregator returns both the item count and the total number of characters,
so you have to divide the two in the query that calls the aggregator:
(?<- (stdout) [?dr ?companion-chars]
(doctor ?c ?dr)
(full-name ?c ?name)
(mean-count-p ?name :> ?n ?total)
(div ?total ?n :> ?companion-chars))
Having so many options to build operators provides us with a lot of lexibility and power in
how we deine and create queries and transformations in Cascalog. This allows you to create
powerful, custom worklows.
 
Search WWH ::




Custom Search