PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

Much more could be said about the grid, but space limitations prevent us from

pursuing this topic further. For more information about the grid, see Abramson

(2011), Balasangameshwara and Raju (2012), Celaya and Arronategui (2011), Fos-

ter and Kesselman (2003), and Lee et al. (2011).

8.6 SUMMARY

It is getting increasingly difficult to make computers go faster by just revving

up the clock due to increased heat dissipation problems and other factors. Instead,

designers are looking to parallelism for speed-up. Parallelism can be introduced at

many different levels, from very low, where the processing elements are very

tightly coupled, to very high, where they are very loosely coupled.

At the bottom level is on-chip parallelism, in which parallel activities occur on

a single chip. One form of on-chip parallelism is instruction-level parallelism, in

which one instruction or a sequence of instructions issues multiple operations that

can be executed in parallel by different functional units. A second form of on-chip

parallelism is multithreading, in which the CPU can switch back and forth among

multiple threads on an instruction-by-instruction basis, creating a virtual multi-

processor. A third form of on-chip parallelism is the single-chip multiprocessor, in

which two or more cores are placed on the same chip to allow them to run at the

same time.

One level up we find the coprocessors, typically plug-in boards that add extra

processing power in some specialized area such as network protocol processing or

multimedia. These extra processors relieve the main CPU of work, allowing it to

do other things while they are performing their specialized tasks.

At the next level, we find the shared-memory multiprocessors. These systems

contain two or more full-blown CPUs that share a common memory. UMA multi-

processors communicate via a shared (snooping) bus, a crossbar switch, or a multi-

stage switching network. They are characterized by having a uniform access time

to all memory locations. In contrast, NUMA multiprocessors also present all proc-

esses with the same shared address space, but here remote accesses take apprecia-

bly longer than local ones. Finally, COMA multiprocessors are yet another varia-

tion, in which cache lines move around the machine on demand but have no real

home as in the other designs.

Multicomputers are systems with many CPUs that do not share a common

memory. Each has its own private memory, with communication by message pas-

sing. MPPs are large multicomputers with specialized communication networks

such as IBM's BlueGene/L. Clusters are simpler systems using off-the-shelf com-

ponents, such as the engine that powers Google.

Multicomputers are often programmed using a message-passing package such

as MPI. An alternative approach is to use application-level shared memory such as

a page-based DSM system, the Linda tuple space, or Orca or Globe objects. DSM

Search WWH ::

Custom Search

Home