Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Although the three properties just described are sufficient to ensure coherence, the question

of when a writen value will be seen is also important. To see why, observe that we cannot re-

quire that a read of X instantaneously see the value writen for X by some other pro-cessor. If,

for example, a write of X on one processor precedes a read of X on another processor by a very

small time, it may be impossible to ensure that the read returns the value of the data written,

since the writen data may not even have left the processor at that point. The issue of exactly

when a writen value must be seen by a reader is deined by a memory consistency model —a top-

ic discussed in Section 5.6 .

Coherence and consistency are complementary: Coherence defines the behavior of reads

and writes to the same memory location, while consistency defines the behavior of reads and

writes with respect to accesses to other memory locations. For now, make the following two

assumptions. First, a write does not complete (and allow the next write to occur) until all pro-

cessors have seen the effect of that write. Second, the processor does not change the order of

any write with respect to any other memory access. These two conditions mean that, if a pro-

cessor writes location A followed by location B, any processor that sees the new value of B

must also see the new value of A. These restrictions allow the processor to reorder reads, but

forces the processor to finish a write in program order. We will rely on this assumption until

we reach Section 5.6 , where we will see exactly the implications of this definition, as well as

the alternatives.

Basic Schemes For Enforcing Coherence

The coherence problem for multiprocessors and I/O, although similar in origin, has different

characteristics that affect the appropriate solution. Unlike I/O, where multiple data copies are

a rare event—one to be avoided whenever possible—a program running on multiple pro-

cessors will normally have copies of the same data in several caches. In a coherent multipro-

cessor, the caches provide both migration and replication of shared data items.

Coherent caches provide migration, since a data item can be moved to a local cache and

used there in a transparent fashion. This migration reduces both the latency to access a shared

data item that is allocated remotely and the bandwidth demand on the shared memory.

Coherent caches also provide replication for shared data that are being simultaneously read,

since the caches make a copy of the data item in the local cache. Replication reduces both

latency of access and contention for a read shared data item. Supporting this migration and

replication is critical to performance in accessing shared data. Thus, rather than trying to solve

the problem by avoiding it in software, multiprocessors adopt a hardware solution by intro-

ducing a protocol to maintain coherent caches.

The protocols to maintain coherence for multiple processors are called cache coherence proto-

cols . Key to implementing a cache coherence protocol is tracking the state of any sharing of a

data block. There are two classes of protocols in use, each of which uses different techniques

to track the sharing status:

■ Directory based —The sharing status of a particular block of physical memory is kept in one

location, called the directory . There are two very different types of directory-based cache

coherence. In an SMP, we can use one centralized directory, associated with the memory

or some other single serialization point, such as the outermost cache in a multicore. In a

DSM, it makes no sense to have a single directory, since that would create a single point

of contention and make it difficult to scale to many multicore chips given the memory de-

mands of multicores with eight or more cores. Distributed directories are more complex

than a single directory, and such designs are the subject of Section 5.4 .

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home