Improving Performance with Parallel Programming - Clojure Data Analysis

Database Reference

In-Depth Information

A way to get around this is to make sure that pmap has enough to do at each step it parallelizes.

The easiest way to do this is to partition the input collection into chunks and run pmap on groups

of the input.

For this recipe, we'll use Monte Carlo methods to approximate pi . We'll compare a serial version

against a naïve parallel version as well as a version that uses parallelization and partitions.

Monte Carlo methods work by attacking a deterministic problem, such as computing pi,

nondeterministically. That is, we'll take a nonrandom problem and throw random data at

it in order to compute the results. We'll see how this works and go into more detail on

what this means toward the end of this recipe.

Getting ready

We'll use Criterium ( https://github.com/hugoduncan/criterium ) to handle

benchmarking, so we'll need to include it as a dependency in our Leiningen project.clj ile:

(defproject parallel-data "0.1.0"

:dependencies [[org.clojure/clojure "1.6.0"]

[criterium "0.4.3"]])

We'll also use Criterium and the java.lang.Math class in our script or REPL:

(use 'criterium.core)

(import [java.lang Math])

How to do it…

To implement this, we'll deine some core functions and then implement a Monte Carlo

method that uses pmap to estimate pi.

1. We need to deine the functions that are necessary for the simulation. We'll have

one function that generates a random two-dimensional point, which will fall

somewhere in the unit square:

(defn rand-point [] [(rand) (rand)])

2.

Now, we need a function to return a point's distance from the origin:

(defn center-dist [[x y]]

(Math/sqrt (+ (* x x) (* y y))))

3. Next, we'll deine a function that takes a number of points to process and creates

that many random points. It will return the number of points that fall inside a circle:

(defn count-in-circle [n]

(->> (repeatedly n rand-point)

(map center-dist)

Search WWH ::

Custom Search

Home