Hardware Reference
In-Depth Information
FIGURE 5.4 An example of an invalidation protocol working on a snooping bus for a
single cache block (X) with write-back caches . We assume that neither cache initially
holds X and that the value of X in memory is 0. The processor and memory contents show the
value after the processor and bus activity have both completed. A blank indicates no activity
or no copy cached. When the second miss by B occurs, processor A responds with the value
canceling the response from memory. In addition, both the contents of B's cache and the
memory contents of X are updated. This update of memory, which occurs when a block be-
comes shared, simplifies the protocol, but it is possible to track the ownership and force the
write-back only if the block is replaced. This requires the introduction of an additional state
called “owner,” which indicates that a block may be shared, but the owning processor is re-
sponsible for updating any other processors and memory when it changes the block or re-
places it. If a multicore uses a shared cache (e.g., L3), then all memory is seen through the
shared cache; L3 acts like the memory in this example, and coherency must be handled for
the private L1 and L2 for each core. It is this observation that led some designers to opt for a
directory protocol within the multicore. To make this work the L3 cache must be inclusive (see
page 397).
The alternative to an invalidate protocol is to update all the cached copies of a data item
when that item is writen. This type of protocol is called a write update or write broadcast pro-
tocol. Because a write update protocol must broadcast all writes to shared cache lines, it con-
sumes considerably more bandwidth. For this reason, recent multiprocessors have opted to
implement a write invalidate protocol, and we will focus only on invalidate protocols for the
rest of the chapter.
Basic Implementation Techniques
The key to implementing an invalidate protocol in a multicore is the use of the bus, or another
broadcast medium, to perform invalidates. In older multiple-chip multiprocessors, the bus
used for coherence is the shared-memory access bus. In a multicore, the bus can be the con-
nection between the private caches (L1 and L2 in the Intel Core i7) and the shared outer cache
(L3 in the i7). To perform an invalidate, the processor simply acquires bus access and broad-
casts the address to be invalidated on the bus. All processors continuously snoop on the bus,
watching the addresses. The processors check whether the address on the bus is in their cache.
If so, the corresponding data in the cache are invalidated.
When a write to a block that is shared occurs, the writing processor must acquire bus access
to broadcast its invalidation. If two processors atempt to write shared blocks at the same time,
their atempts to broadcast an invalidate operation will be serialized when they arbitrate for
the bus. The first processor to obtain bus access will cause any other copies of the block it is
writing to be invalidated. If the processors were atempting to write the same block, the serial-
 
Search WWH ::




Custom Search