Hardware Reference
In-Depth Information
Much more could be said about the grid, but space limitations prevent us from
pursuing this topic further. For more information about the grid, see Abramson
(2011), Balasangameshwara and Raju (2012), Celaya and Arronategui (2011), Fos-
ter and Kesselman (2003), and Lee et al. (2011).
8.6 SUMMARY
It is getting increasingly difficult to make computers go faster by just revving
up the clock due to increased heat dissipation problems and other factors. Instead,
designers are looking to parallelism for speed-up. Parallelism can be introduced at
many different levels, from very low, where the processing elements are very
tightly coupled, to very high, where they are very loosely coupled.
At the bottom level is on-chip parallelism, in which parallel activities occur on
a single chip. One form of on-chip parallelism is instruction-level parallelism, in
which one instruction or a sequence of instructions issues multiple operations that
can be executed in parallel by different functional units. A second form of on-chip
parallelism is multithreading, in which the CPU can switch back and forth among
multiple threads on an instruction-by-instruction basis, creating a virtual multi-
processor. A third form of on-chip parallelism is the single-chip multiprocessor, in
which two or more cores are placed on the same chip to allow them to run at the
same time.
One level up we find the coprocessors, typically plug-in boards that add extra
processing power in some specialized area such as network protocol processing or
multimedia. These extra processors relieve the main CPU of work, allowing it to
do other things while they are performing their specialized tasks.
At the next level, we find the shared-memory multiprocessors. These systems
contain two or more full-blown CPUs that share a common memory. UMA multi-
processors communicate via a shared (snooping) bus, a crossbar switch, or a multi-
stage switching network. They are characterized by having a uniform access time
to all memory locations. In contrast, NUMA multiprocessors also present all proc-
esses with the same shared address space, but here remote accesses take apprecia-
bly longer than local ones. Finally, COMA multiprocessors are yet another varia-
tion, in which cache lines move around the machine on demand but have no real
home as in the other designs.
Multicomputers are systems with many CPUs that do not share a common
memory. Each has its own private memory, with communication by message pas-
sing. MPPs are large multicomputers with specialized communication networks
such as IBM's BlueGene/L. Clusters are simpler systems using off-the-shelf com-
ponents, such as the engine that powers Google.
Multicomputers are often programmed using a message-passing package such
as MPI. An alternative approach is to use application-level shared memory such as
a page-based DSM system, the Linda tuple space, or Orca or Globe objects. DSM
 
 
Search WWH ::




Custom Search