Database Reference
In-Depth Information
Introduction
If concurrent processing has performance implications when structuring programs, parallel
processing is a way to get better performance that has implications on how we structure our
programs. Although they are often conlated, concurrent processing and parallel processing
are different solutions to different problems. Concurrency is good for expressing programs that
involve different tasks that can be, or must be, carried out at the same time. Parallelization is a
good option if you want to perform the same task many times, all at once. Parallelization is not
necessary, but it can help tremendously with your program's performance.
Earlier, the easiest, and often best, strategy to improve performance was to go on a vacation.
Moore's law implies that processor speed will double approximately every 18 months, so in
the 1990s, we could go on vacation, return, buy a new computer, and our programs were
faster. This was magic.
Today, we're no longer under Moore's law, instead, as the saying goes, the free lunch is over .
Now, processor speeds have plateaued or even declined. Instead, computers are made faster
by packing more processors into them. To make use of these processors, we have to employ
parallel programming.
Of course, the processor isn't always the slowest part of the program (that is, our programs
aren't always CPU bound). Sometimes, it's the disk or network that limits how fast our
programs run. If that's the case, we have to read from multiple disks or network connections
simultaneously in order to see any improvement in speed. For example, reading from a single
ile from different processors might even be slower, but if you can copy the ile onto different
disks and read each ile from a separate processor, it will be faster in all likelihood.
The recipes in this chapter focus on leveraging multiple cores by showing different ways to
parallelize Clojure programs. It also includes a few recipes on related topics. For instance,
consider the Using type hints recipe, which talks about how to optimize our code, and the
Benchmarking with Criterium recipe, which discusses how to get good data for optimizing
our code.
Parallelizing processing with pmap
The easiest way to parallelize data is to take a loop that you already have and handle each
item in it in a thread.
This is essentially what pmap does. If you replace a call to map with pmap , it takes each call
to the function's argument and executes it in a thread pool. pmap is not completely lazy, but
it's not completely strict either. Instead, it stays just ahead of the output consumed. So, if the
output is never used, it won't be fully realized.
 
Search WWH ::




Custom Search