Improving Performance with Parallel Programming - Clojure Data Analysis

Database Reference

In-Depth Information

Introduction

If concurrent processing has performance implications when structuring programs, parallel

processing is a way to get better performance that has implications on how we structure our

programs. Although they are often conlated, concurrent processing and parallel processing

are different solutions to different problems. Concurrency is good for expressing programs that

involve different tasks that can be, or must be, carried out at the same time. Parallelization is a

good option if you want to perform the same task many times, all at once. Parallelization is not

necessary, but it can help tremendously with your program's performance.

Earlier, the easiest, and often best, strategy to improve performance was to go on a vacation.

Moore's law implies that processor speed will double approximately every 18 months, so in

the 1990s, we could go on vacation, return, buy a new computer, and our programs were

faster. This was magic.

Today, we're no longer under Moore's law, instead, as the saying goes, the free lunch is over .

Now, processor speeds have plateaued or even declined. Instead, computers are made faster

by packing more processors into them. To make use of these processors, we have to employ

parallel programming.

Of course, the processor isn't always the slowest part of the program (that is, our programs

aren't always CPU bound). Sometimes, it's the disk or network that limits how fast our

programs run. If that's the case, we have to read from multiple disks or network connections

simultaneously in order to see any improvement in speed. For example, reading from a single

ile from different processors might even be slower, but if you can copy the ile onto different

disks and read each ile from a separate processor, it will be faster in all likelihood.

The recipes in this chapter focus on leveraging multiple cores by showing different ways to

parallelize Clojure programs. It also includes a few recipes on related topics. For instance,

consider the Using type hints recipe, which talks about how to optimize our code, and the

Benchmarking with Criterium recipe, which discusses how to get good data for optimizing

our code.

Parallelizing processing with pmap

The easiest way to parallelize data is to take a loop that you already have and handle each

item in it in a thread.

This is essentially what pmap does. If you replace a call to map with pmap , it takes each call

to the function's argument and executes it in a thread pool. pmap is not completely lazy, but

it's not completely strict either. Instead, it stays just ahead of the output consumed. So, if the

output is never used, it won't be fully realized.

Search WWH ::

Custom Search

Home