Hardware Reference
In-Depth Information
Although the three properties just described are sufficient to ensure coherence, the question
of when a writen value will be seen is also important. To see why, observe that we cannot re-
quire that a read of X instantaneously see the value writen for X by some other pro-cessor. If,
for example, a write of X on one processor precedes a read of X on another processor by a very
small time, it may be impossible to ensure that the read returns the value of the data written,
since the writen data may not even have left the processor at that point. The issue of exactly
when a writen value must be seen by a reader is deined by a memory consistency model —a top-
ic discussed in Section 5.6 .
Coherence and consistency are complementary: Coherence defines the behavior of reads
and writes to the same memory location, while consistency defines the behavior of reads and
writes with respect to accesses to other memory locations. For now, make the following two
assumptions. First, a write does not complete (and allow the next write to occur) until all pro-
cessors have seen the effect of that write. Second, the processor does not change the order of
any write with respect to any other memory access. These two conditions mean that, if a pro-
cessor writes location A followed by location B, any processor that sees the new value of B
must also see the new value of A. These restrictions allow the processor to reorder reads, but
forces the processor to finish a write in program order. We will rely on this assumption until
we reach Section 5.6 , where we will see exactly the implications of this definition, as well as
the alternatives.
Basic Schemes For Enforcing Coherence
The coherence problem for multiprocessors and I/O, although similar in origin, has different
characteristics that affect the appropriate solution. Unlike I/O, where multiple data copies are
a rare event—one to be avoided whenever possible—a program running on multiple pro-
cessors will normally have copies of the same data in several caches. In a coherent multipro-
cessor, the caches provide both migration and replication of shared data items.
Coherent caches provide migration, since a data item can be moved to a local cache and
used there in a transparent fashion. This migration reduces both the latency to access a shared
data item that is allocated remotely and the bandwidth demand on the shared memory.
Coherent caches also provide replication for shared data that are being simultaneously read,
since the caches make a copy of the data item in the local cache. Replication reduces both
latency of access and contention for a read shared data item. Supporting this migration and
replication is critical to performance in accessing shared data. Thus, rather than trying to solve
the problem by avoiding it in software, multiprocessors adopt a hardware solution by intro-
ducing a protocol to maintain coherent caches.
The protocols to maintain coherence for multiple processors are called cache coherence proto-
cols . Key to implementing a cache coherence protocol is tracking the state of any sharing of a
data block. There are two classes of protocols in use, each of which uses different techniques
to track the sharing status:
Directory based —The sharing status of a particular block of physical memory is kept in one
location, called the directory . There are two very different types of directory-based cache
coherence. In an SMP, we can use one centralized directory, associated with the memory
or some other single serialization point, such as the outermost cache in a multicore. In a
DSM, it makes no sense to have a single directory, since that would create a single point
of contention and make it difficult to scale to many multicore chips given the memory de-
mands of multicores with eight or more cores. Distributed directories are more complex
than a single directory, and such designs are the subject of Section 5.4 .
Search WWH ::




Custom Search