Performance and Scalability - Java Concurrency in Practice

Java Reference

In-Depth Information

synchronized LinkedList shows some improvement up to three threads, but then falls off

as synchronization overhead increases. By the time it gets to four or five threads, contention

is so heavy that every access to the queue lock is contended and throughput is dominated by

context switching.

The difference in throughput comes from differing degrees of serialization between the

two queue implementations. The synchronized LinkedList guards the entire queue state

with a single lock that is held for the duration of the offer or remove call; Concur-

rentLinkedQueue uses a sophisticated nonblocking queue algorithm (see Section 15.4.2 )

that uses atomic references to update individual link pointers. In one, the entire insertion or

removal is serialized; in the other, only updates to individual pointers are serialized.

11.2.2. Applying Amdahl's Law Qualitatively

Amdahl's law quantifies the possible speedup when more computing resources are available,

if we can accurately estimate the fraction of execution that is serialized. Although measuring

serialization directly can be difficult, Amdahl's law can still be useful without such measure-

ment.

Since our mental models are influenced by our environment, many of us are used to thinking

that a multiprocessor system has two or four processors, or maybe (if we've got a big budget)

as many as a few dozen, because this is the technology that has been widely available in re-

cent years. But as multicore CPUs become mainstream, systems will have hundreds or even

thousands of processors. [3] Algorithms that seem scalable on a four-way system may have

hidden scalability bottlenecks that have just not yet been encountered.

When evaluating an algorithm, thinking “in the limit” about what would happen with hun-

dreds or thousands of processors can offer some insight into where scaling limits might ap-

pear. For example, Sections 11.4.2 and 11.4.3 discuss two techniques for reducing lock gran-

ularity: lock splitting (splitting one lock into two) and lock striping (splitting one lock into

many). Looking at them through the lens of Amdahl's law, we see that splitting a lock in two

does not get us very far towards exploiting many processors, but lock striping seems much

more promising because the size of the stripe set can be increased as processor count in-

creases. (Of course, performance optimizations should always be considered in light of actual

performance requirements; in some cases, splitting a lock in two may be enough to meet the

requirements.)

Search WWH ::

Custom Search

Home