Improving Performance with Parallel Programming - Clojure Data Analysis

Database Reference

In-Depth Information

4.

The next function takes the current state's cost, a potential new state's cost, and

the current energy in the system (from 0 to 1). It returns the odds that the new state

should be used. Currently, this will always skip to an improved state or a worse state

25 percent of the time (both of these are prorated by the temperature):

(defn should-move [c0 c1 t]

(* t (if (< c0 c1) 0.25 1.0)))

5. The inal function parameter takes the current percent through the iteration count

and returns the energy or temperature as a number from 0 to 1. This can use a

number of easing functions, but for this, we'll just use a simple linear one:

(defn get-temp [r] (- 1.0 (float r)))

That's it. We can let this ind a good partition size. We'll start with the value that we used in

the Partitioning Monte Carlo simulations for better pmap performance recipe. We'll only allow

10 iterations since the search space is relatively small:

user=> (annealing 12 10 nil get-neighbor

(partial get-pi-cost 1000000)

should-move get-temp)

>>> sa 1 . 12 $ 0.5805938333333334

>>> sa 2 . 8 $ 0.38975950000000004

>>> sa 3 . 8 $ 0.38975950000000004

>>> sa 4 . 8 $ 0.38975950000000004

>>> sa 5 . 8 $ 0.38975950000000004

>>> sa 6 . 8 $ 0.38975950000000004

>>> sa 7 . 6 $ 0.357514

>>> sa 8 . 6 $ 0.357514

>>> sa 9 . 6 $ 0.357514

>>> sa 10 . 6 $ 0.357514

[{:state 12, :cost 0.5805938333333334}

{:state 8, :cost 0.38975950000000004}

{:state 6, :cost 0.357514}]

We can see that a partition size of 64 (26) is the best time, and rerunning the benchmarks

veriies this.

How it works…

In practice, this algorithm won't help if we run it over the full input data. However, if we can

get a large enough sample, this can help us process the full dataset more eficiently by taking

a lot of the guesswork out of picking the right partition size for the full evaluation.

Search WWH ::

Custom Search

Home