Hardware Reference
In-Depth Information
Amdahl's law is not the only reason perfect speed-up is nearly impossible to
achieve. Nonzero communication latencies, finite communication bandwidths, and
algorithmic inefficiencies can also play a role. Also, even if 1000 CPUs were
available, not all programs can be written to make use of so many CPUs, and the
overhead in getting them all started may be significant. Furthermore, sometimes
the best-known algorithm does not parallelize well, so a suboptimal algorithm must
be used in the parallel case. This all said, for many applications, having the pro-
gram run n times faster is highly desirable, even if it takes 2 n CPUs to do it. CPUs
are not that expensive, after all, and many companies live with considerably less
than 100% efficiency in other parts of their businesses.
Achieving High Performance
The most straightforward way to improve performance is to add more CPUs to
the system. However, this addition must be done in such a way as to avoid creating
any bottlenecks. A system in which one can add more CPUs and get correspond-
ingly more computing power is said to be scalable .
To see some of the implications of scalability, consider four CPUs connected
by a bus, as illustrated in Fig. 8-51(a). Now imagine scaling the system to 16
CPUs by adding 12 more, as shown in Fig. 8-51(b). If the bandwidth of the bus is
b MB/sec, then by quadrupling the number of CPUs, we have also reduced the
available bandwidth per CPU from b /4 MB/sec to b /16 MB/sec. Such a system is
not scalable.
CPU
Bus
(a)
(b)
(c)
(d)
Figure 8-51. (a) A 4-CPU bus-based system. (b) A 16-CPU bus-based system.
(c) A 4-CPU grid-based system. (d) A 16-CPU grid-based system.
Now we do the same thing with a grid-based system, as shown in Fig. 8-51(c)
and Fig. 8-51(d). With this topology, adding new CPUs also adds new links, so
scaling the system up does not cause the aggregate bandwidth per CPU to drop, as
it does with a bus. In fact, the ratio of links to CPUs increases from 1.0 with 4
CPUs (4 CPUs, 4 links) to 1.5 with 16 CPUs (16 CPUs, 24 links), so adding CPUs
improves the aggregate bandwidth per CPU.
 
 
Search WWH ::




Custom Search