Database Reference
In-Depth Information
Overhead used : 1.875845 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately
inflated by outliers
nil
user=> (quick-bench (mc-pi-r 1000000))
WARNING: Final GC required 8.023979507099268 % of runtime
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 168.998166 ms
Execution time std-deviation : 3.615209 ms
Execution time lower quantile : 164.074999 ms ( 2.5%)
Execution time upper quantile : 173.148749 ms (97.5%)
Overhead used : 1.875845 ns
nil
Not bad. On eight cores, the version without reducers is almost six times slower. This is
more impressive because we made relatively minor changes to the original code, especially
when compared to the version of this algorithm that partitioned the input before passing
it to pmap , which we also saw in the Partitioning Monte Carlo simulations for better pmap
performance recipe.
How it works…
The reducers library does a couple of things in this recipe. Let's take a look at some lines
from count-in-circle-r . Converting the input to a vector was important, because
vectors can be parallelized, but generic sequences cannot.
Next, these two lines are combined into one reducer function that doesn't create an
extra sequence between the call to r/map and r/filter . This is a small, but important,
optimization, especially if we stacked more functions into this stage of the process:
(r/map center-dist)
(r/filter #(<= % 1.0))
The bigger optimization is in the line for r/fold . r/reduce always processes serially, but if
the input is a tree-based data structure, r/fold will employ a fork-join pattern to parallelize
it. This line takes the place of a call to count by incrementing a counter function for every
item in the sequence so far:
(r/fold + count-items))))
 
Search WWH ::




Custom Search