Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

FIGURE 5.2 The basic architecture of a distributed-memory multiprocessor in 2011 typ-

ically consists of a multicore multiprocessor chip with memory and possibly I/O at-

tached and an interface to an interconnection network that connects all the nodes .

Each processor core shares the entire memory, although the access time to the lock memory

attached to the core's chip will be much faster than the access time to remote memories.

Distributing the memory among the nodes both increases the bandwidth and reduces the

latency to local memory. A DSM multiprocessor is also called a NUMA (nonuniform memory

access), since the access time depends on the location of a data word in memory. The key

disadvantages for a DSM are that communicating data among processors becomes somewhat

more complex, and a DSM requires more effort in the software to take advantage of the in-

creased memory bandwidth afforded by distributed memories. Because all multicore-based

multiprocessors with more than one processor chip (or socket) use distributed memory, we

will explain the operation of distributed memory multiprocessors from this viewpoint.

In both SMP and DSM architectures, communication among threads occurs through a

shared address space, meaning that a memory reference can be made by any processor to any

memory location, assuming it has the correct access rights. The term shared memory associated

with both SMP and DSM refers to the fact that the address space is shared.

In contrast, the clusters and warehouse-scale computers of the next chapter look like indi-

vidual computers connected by a network, and the memory of one processor cannot be ac-

cessed by another processor without the assistance of software protocols running on both pro-

cessors. In such designs, message-passing protocols are used to communicate data among pro-

cessors.

Challenges Of Parallel Processing

The application of multiprocessors ranges from running independent tasks with essentially no

communication to running parallel programs where threads must communicate to complete

the task. Two important hurdles, both explainable with Amdahl's law, make parallel process-

Search WWH ::

Custom Search

Home