Hardware Reference
In-Depth Information
ively). These next-level interconnects are not just extensions of the shared bus, but use a dif-
ferent approach for interconnecting multicores.
A multiprocessor built with multiple multicore chips will have a distributed memory archi-
tecture and will need an interchip coherency mechanism above and beyond the one within the
chip. In most cases, some form of directory scheme is used.
Extensions To The Basic Coherence Protocol
The coherence protocol we have just described is a simple three-state protocol and is often re-
ferred to by the irst leter of the states, making it a MSI (Modiied, Shared, Invalid) protocol.
There are many extensions of this basic protocol, which we mentioned in the captions of ig-
ures in this section. These extensions are created by adding additional states and transactions,
which optimize certain behaviors, possibly resulting in improved performance. Two of the
most common extensions are
1. MESI adds the state Exclusive to the basic MSI protocol to indicate when a cache block
is resident only in a single cache but is clean. If a block is in the E state, it can be writen
without generating any invalidates, which optimizes the case where a block is read by a
single cache before being writen by that same cache. Of course, when a read miss to a
block in the E state occurs, the block must be changed to the S state to maintain coherence.
Because all subsequent accesses are snooped, it is possible to maintain the accuracy of this
state. In particular, if another processor issues a read miss, the state is changed from ex-
clusive to shared. The advantage of adding this state is that a subsequent write to a block in
the exclusive state by the same core need not acquire bus access or generate an invalidate,
since the block is known to be exclusively in this local cache; the processor merely changes
the state to modified. This state is easily added by using the bit that encodes the coherent
state as an exclusive state and using the dirty bit to indicate that a bock is modified. The
popular MESI protocol, which is named for the four states it includes (Modified, Exclus-
ive, Shared, and Invalid), uses this structure. The Intel i7 uses a variant of a MESI protocol,
called MESIF, which adds a state (Forward) to designate which sharing processor should
respond to a request. It is designed to enhance performance in distributed memory organ-
izations.
2. MOESI adds the state Owned to the MESI protocol to indicate that the associated block is
owned by that cache and out-of-date in memory. In MSI and MESI protocols, when there
is an atempt to share a block in the Modiied state, the state is changed to Shared (in both
the original and newly sharing cache), and the block must be writen back to memory. In a
MOESI protocol, the block can be changed from the Modified to Owned state in the origin-
all cache without writing it to memory. Other caches, which are newly sharing the block,
keep the block in the Shared state; the O state, which only the original cache holds, indic-
ates that the main memory copy is out of date and that the designated cache is the owner.
The owner of the block must supply it on a miss, since memory is not up to date and must
write the block back to memory if it is replaced. The AMD Opteron uses the MOESI pro-
tocol.
The next section examines the performance of these protocols for our parallel and multipro-
grammed workloads; the value of these extensions to a basic protocol will be clear when we
examine the performance. But, before we do that, let's take a brief look at the limitations on
the use of a symmetric memory structure and a snooping coherence scheme.
Search WWH ::




Custom Search