Hardware Reference
In-Depth Information
that appendix, we examine the nature of such applications and the challenges of achieving
speedup with dozens to hundreds of processors.
5.2 Centralized Shared-Memory Architectures
The observation that the use of large, multilevel caches can substantially reduce the memory
bandwidth demands of a processor is the key insight that motivates centralized memory mul-
tiprocessors. Originally, these processors were all single-core and often took an entire board,
and memory was located on a shared bus. With more recent, higher-performance processors,
the memory demands have outstripped the capability of reasonable buses, and recent mi-
croprocessors directly connect memory to a single chip, which is sometimes called a backside
or memory bus to distinguish it from the bus used to connect to I/O. Accessing a chip's loc-
all memory whether for an I/O operation or for an access from another chip requires going
through the chip that “owns” that memory. Thus, access to memory is asymmetric: faster to
the local memory and slower to the remote memory. In a multicore that memory is shared
among all the cores on a single chip, but the asymmetric access to the memory of one mul-
ticore from the memory of another remains.
Symmetric shared-memory machines usually support the caching of both shared and
private data. Private data are used by a single processor, while shared data are used by multiple
processors, essentially providing communication among the processors through reads and
writes of the shared data. When a private item is cached, its location is migrated to the
cache, reducing the average access time as well as the memory bandwidth required. Since
no other processor uses the data, the program behavior is identical to that in a uniprocessor.
When shared data are cached, the shared value may be replicated in multiple caches. In addi-
tion to the reduction in access latency and required memory bandwidth, this replication also
provides a reduction in contention that may exist for shared data items that are being read by
multiple processors simultaneously. Caching of shared data, however, introduces a new prob-
lem: cache coherence.
What Is Multiprocessor Cache Coherence?
Unfortunately, caching shared data introduces a new problem because the view of memory
held by two different processors is through their individual caches, which, without any addi-
tional precautions, could end up seeing two different values. Figure 5.3 illustrates the problem
and shows how two different processors can have two different values for the same location.
This difficulty is generally referred to as the cache coherence problem . Notice that the coherence
problem exists because we have both a global state, defined primarily by the main memory,
and a local state, defined by the individual caches, which are private to each processor core.
Thus, in a multicore where some level of caching may be shared (for example, an L3), while
some levels are private (for example, L1 and L2), the coherence problem still exists and must
be solved.
 
Search WWH ::




Custom Search