Database Reference
In-Depth Information
Graphically, this process looks something similar to this chart:
repeatedly n rand-point
vec
P
a
r
t
i
t
I
o
n
1
P
a
r
t
i
t
I
o
n
2
...
...
Count-
items
Count-
items
Count-
items
Count-
items
+
+
+
result
The reducers library has a lot of promise to automatically parallelize structured operations
with a level of control and simplicity that we haven't seen elsewhere.
See also
F We'll see another example with reducers in the next recipe, Generating online
summary statistics for data streams with reducers
Generating online summary statistics for
data streams with reducers
We can use reducers in a lot of different situations, but sometimes we need to change how we
process data to do so.
For this example, we'll show you how to compute summary statistics with reducers. We'll use
some algorithms and formulas, irst proposed by Tony F. Chan, Gene H. Golub, and Randall
J. LeVeque in 1979, and later extended by Timothy B. Terriberry in 2007. These allow you to
approximate the mean, standard deviation, and skew for online data (that is, to stream data
that we might only see once). So, we will need to compute all of the statistics on one pass
without holding the full collection in memory.
 
Search WWH ::




Custom Search