Hardware Reference
In-Depth Information
served, multicore does shift more responsibility for performance (and hence energy eiciency)
to the programmer, and the results for the Java workload certainly bear this out.
5.9 Fallacies and Pitfalls
Given the lack of maturity in our understanding of parallel computing, there are many hidden
pitfalls that will be uncovered either by careful designers or by unfortunate ones. Given the
large amount of hype that has surrounded multi-processors over the years, common fallacies
abound. We have included a selection of these.
Pitfall Measuring Performance Of Multiprocessors By Linear Speedup Versus
Execution Time
“Mortar shot” graphs—ploting performance versus number of processors, showing linear
speedup, a plateau, and then a falling off—have long been used to judge the success of parallel
processors. Although speedup is one facet of a parallel program, it is not a direct measure
of performance. The first question is the power of the processors being scaled: A program
that linearly improves performance to equal 100 Intel Atom processors (the low-end pro-
cessor used for netbooks) may be slower than the version run on an eight-core Xeon. Be es-
pecially careful of floating-point-intensive programs; processing elements without hardware
assist may scale wonderfully but have poor collective performance.
Comparing execution times is fair only if you are comparing the best algorithms on each
computer. Comparing the identical code on two computers may seem fair, but it is not; the
paral-lel program may be slower on a uniprocessor than a sequential ver-sion. Developing a
parallel program will sometimes lead to algo-rithmic improvements, so comparing the pre-
viously best-known sequential program with the parallel code—which seems fair—will not
compare equivalent algorithms. To reflect this issue, the terms relative speedup (same program)
and true speedup (best program) are sometimes used.
Re-sults that suggest superlinear performance, when a program on n pro-cessors is more than
n times faster than the equivalent uniproces-sor, may indicate that the comparison is unfair, al-
though there are instances where “real” superlinear speedups have been encountered. For ex-
ample, some scientific applications regularly achieve superlinear speedup for small increases
in processor count (2 or 4 to 8 or 16). These results usually arise because critical data structures
that do not it into the aggregate caches of a multiprocessor with 2 or 4 processors it into the
aggregate cache of a multiprocessor with 8 or 16 processors.
In summary, comparing performance by comparing speedups is at best tricky and at worst
misleading. Comparing the speedups for two different multiprocessors does not necessarily
tell us anything about the relative performance of the multiprocessors. Even comparing two
different algorithms on the same multiprocessor is tricky, since we must use true speedup,
rather than relative speedup, to obtain a valid comparison.
Fallacy Amdahl's Law Doesn't Apply To Parallel Computers
In 1987, the head of a research organization claimed that Amdahl's law (see Section 1.9 ) had
been broken by an MIMD multiprocessor. This statement hardly meant, however, that the law
has been over-turned for parallel computers; the neglected portion of the program will still
 
Search WWH ::




Custom Search