Hardware Reference
In-Depth Information
When the block is in the exclusive state, the current value of the block is held in a cache on the
node identified by the set Sharers (the owner), so there are three possible directory requests:
Read miss —The owner is sent a data fetch message, which causes the state of the block in
the owner's cache to transition to shared and causes the owner to send the data to the dir-
ectory, where it is writen to memory and sent back to the requesting processor. The iden-
tity of the requesting node is added to the set Sharers, which still contains the identity of
the processor that was the owner (since it still has a readable copy).
Data write-back —The owner is replacing the block and therefore must write it back. This
write-back makes the memory copy up to date (the home directory essentially becomes the
owner), the block is now uncached, and the Sharers set is empty.
Write miss —The block has a new owner. A message is sent to the old owner, causing the
cache to invalidate the block and send the value to the directory, from which it is sent to
the requesting node, which becomes the new owner. Sharers is set to the identity of the
new owner, and the state of the block remains exclusive.
This state transition diagram in Figure 5.23 is a simplification, just as it was in the snooping
cache case. In the case of a directory, as well as a snooping scheme implemented with a net-
work other than a bus, our protocols will need to deal with nonatomic memory transactions.
Appendix I explores these issues in depth.
The directory protocols used in real multiprocessors contain additional optimizations. In
particular, in this protocol when a read or write miss occurs for a block that is exclusive, the
block is first sent to the directory at the home node. From there it is stored into the home
memory and also sent to the original requesting node. Many of the protocols in use in com-
mercial multiprocessors forward the data from the owner node to the requesting node directly
(as well as performing the write-back to the home). Such optimizations often add complexity
by increasing the possibility of deadlock and by increasing the types of messages that must be
handled.
Implementing a directory scheme requires solving most of the same challenges we dis-
cussed for snooping protocols beginning on page 365. There are, however, new and additional
problems, which we describe in Appendix I. In Section 5.8 , we briefly describe how modern
multicores extend coherence beyond a single chip. The combinations of multichip coherence
and multicore coherence include all four possibilities of snooping/snooping (AMD Opteron),
snooping/directory, directory/snooping, and directory/directory!
5.5 Synchronization: The Basics
Synchronization mechanisms are typically built with user-level software routines that rely on
hardware-supplied synchronization instructions. For smaller multiprocessors or low-conten-
tion situations, the key hardware capability is an uninterruptible instruction or instruction
sequence capable of atomically retrieving and changing a value. Software synchronization
mechanisms are then constructed using this capability. In this section, we focus on the im-
plementation of lock and unlock synchronization operations. Lock and unlock can be used
straight-forwardly to create mutual exclusion, as well as to implement more complex syn-
chronization mechanisms.
In high-contention situations, synchronization can become a performance botleneck be-
cause contention introduces additional delays and because latency is potentially greater in
such a multiprocessor. We discuss how the basic synchronization mechanisms of this section
can be extended for large processor counts in Appendix I.
 
Search WWH ::




Custom Search