Hardware Reference
In-Depth Information
limit performance. To understand the basis of the media reports, let's see what Amdahl [1967]
originally said:
A fairly obvious conclusion which can be drawn at this point is that the effort expended on
achieving high parallel processing rates is wasted unless it is accompanied by achievements in se-
quential processing rates of very nearly the same magnitude. [p. 483]
One interpretation of the law was that, since portions of every program must be sequential,
there is a limit to the useful economic number of processors—say, 100. By showing linear spee-
dup with 1000 processors, this interpretation of Amdahl's law was disproved.
The basis for the statement that Amdahl's law had been “overcome” was the use of scaled
speedup , also called weak scaling . The researchers scaled the benchmark to have a dataset size
that was 1000 times larger and compared the uniprocessor and parallel execution times of the
scaled benchmark. For this particular algorithm, the sequential portion of the program was
constant independent of the size of the input, and the rest was fully parallel—hence, linear
speedup with 1000 processors. Because the running time grew faster than linear, the program
actually ran longer after scaling, even with 1000 processors.
Speedup that assumes scaling of the input is not the same as true speedup and reporting it
as if it were is misleading. Since parallel benchmarks are often run on different-sized multipro-
cessors, it is important to specify what type of application scaling is permissible and how that
scaling should be done. Although simply scaling the data size with processor count is rarely
appropriate, assuming a fixed problem size for a much larger processor count (called strong
scaling ) is often inappropriate, as well, since it is likely that users given a much larger multi-
processor would opt to run a larger or more detailed version of an application. See Appendix
I for more discussion on this important topic.
Fallacy Linear Speedups Are Needed To Make Multiprocessors Cost Effective
It is widely recognized that one of the major benefits of parallel computing is to offer a “shorter
time to solution” than the fastest uniprocessor. Many people, however, also hold the view that
parallel processors cannot be as cost effective as uniprocessors unless they can achieve per-
fect linear speedup. This argument says that, because the cost of the multiprocessor is a linear
function of the number of processors, anything less than linear speedup means that the per-
formance/cost ratio decreases, making a parallel processor less cost effective than using a uni-
processor.
The problem with this argument is that cost is not only a function of processor count but
also depends on memory, I/O, and the overhead of the system (box, power supply, intercon-
nect, and so on). It also makes less sense in the multicore era, when there are multiple pro-
cessors per chip.
The effect of including memory in the system cost was pointed out by Wood and Hill [1995] .
We use an example based on more recent data using TPC-C and SPECRate benchmarks, but
the argument could also be made with a parallel scientific application workload, which would
likely make the case even stronger.
Figure 5.32 shows the speedup for TPC-C, SPECintRate, and SPECfpRate on an IBM eServer
p5 multiprocessor configured with 4 to 64 processors. The figure shows that only TPC-C
achieves beter than linear speedup. For SPECintRate and SPECfpRate, speedup is less than
Search WWH ::




Custom Search