Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

served, multicore does shift more responsibility for performance (and hence energy eiciency)

to the programmer, and the results for the Java workload certainly bear this out.

5.9 Fallacies and Pitfalls

Given the lack of maturity in our understanding of parallel computing, there are many hidden

pitfalls that will be uncovered either by careful designers or by unfortunate ones. Given the

large amount of hype that has surrounded multi-processors over the years, common fallacies

abound. We have included a selection of these.

Pitfall Measuring Performance Of Multiprocessors By Linear Speedup Versus

Execution Time

“Mortar shot” graphs—ploting performance versus number of processors, showing linear

speedup, a plateau, and then a falling off—have long been used to judge the success of parallel

processors. Although speedup is one facet of a parallel program, it is not a direct measure

of performance. The first question is the power of the processors being scaled: A program

that linearly improves performance to equal 100 Intel Atom processors (the low-end pro-

cessor used for netbooks) may be slower than the version run on an eight-core Xeon. Be es-

pecially careful of floating-point-intensive programs; processing elements without hardware

assist may scale wonderfully but have poor collective performance.

Comparing execution times is fair only if you are comparing the best algorithms on each

computer. Comparing the identical code on two computers may seem fair, but it is not; the

paral-lel program may be slower on a uniprocessor than a sequential ver-sion. Developing a

parallel program will sometimes lead to algo-rithmic improvements, so comparing the pre-

viously best-known sequential program with the parallel code—which seems fair—will not

compare equivalent algorithms. To reflect this issue, the terms relative speedup (same program)

and true speedup (best program) are sometimes used.

Re-sults that suggest superlinear performance, when a program on n pro-cessors is more than

n times faster than the equivalent uniproces-sor, may indicate that the comparison is unfair, al-

though there are instances where “real” superlinear speedups have been encountered. For ex-

ample, some scientific applications regularly achieve superlinear speedup for small increases

in processor count (2 or 4 to 8 or 16). These results usually arise because critical data structures

that do not it into the aggregate caches of a multiprocessor with 2 or 4 processors it into the

aggregate cache of a multiprocessor with 8 or 16 processors.

In summary, comparing performance by comparing speedups is at best tricky and at worst

misleading. Comparing the speedups for two different multiprocessors does not necessarily

tell us anything about the relative performance of the multiprocessors. Even comparing two

different algorithms on the same multiprocessor is tricky, since we must use true speedup,

rather than relative speedup, to obtain a valid comparison.

Fallacy Amdahl's Law Doesn't Apply To Parallel Computers

In 1987, the head of a research organization claimed that Amdahl's law (see Section 1.9 ) had

been broken by an MIMD multiprocessor. This statement hardly meant, however, that the law

has been over-turned for parallel computers; the neglected portion of the program will still

Search WWH ::

Custom Search

Home