Database Reference
In-Depth Information
3.
Next, we'll need a way to combine two accumulators. This has the complete,
unsimpliied versions of the formulae from accum-counts . Because some of the
numbers can get very large and can overlow the range of the primitive Java types,
we'll use *' . This is a variant of the multiplication operator that automatically
promotes values into Java's BigInteger types instead of overlowing:
(defn op-fields
"A utility function that calls a function on
the values of a field from two maps."
[op field item1 item2]
(op (field item1) (field item2)))
(defn combine-counts
([] zero-counts)
([xa xb]
(let [n (long (op-fields + :n xa xb))
delta (op-fields - :mean xb xa)
nxa*xb (*' (:n xa) (:n xb))]
{:n n
:mean (+ (:mean xa) (* delta (/ (:n xb) n)))
:s (op-fields + :s xa xb)
:m2 (+ (:m2 xa) (:m2 xb)
(* delta delta (/ nxa*xb n)))})))
4.
Now, we need a way to take the accumulated counts and values and turn them
into the inal statistics:
(defn stats-from-sums [{:keys [n mean m2 s] :as sums}]
{:mean (double (/ s n))
:variance (/ m2 (dec n))})
5.
Finally, we can combine all of these functions to produce results:
(defn summary-statistics [coll]
(stats-from-sums
(r/fold combine-counts accum-counts coll)))
For a pointless example, we can use this to ind summary statistics on 1,000,000
random numbers:
user=> (summary-statistics (repeatedly 1000000 rand))
{:mean 0.5004908831693459, :variance 0.08346136740444697}
 
Search WWH ::




Custom Search