Hardware Reference
In-Depth Information
is not an optimization. Rather, to ensure forward progress, protocol implementations must
ensure that they perform at least one CPU operation before relinquishing a block. Suppose
the coherence protocol implementation did not do this. Explain how this might lead to live-
lock. Give a simple code example that could stimulate this behavior.
5.17 [20/30] <5.4> Some directory protocols add an Owned (O) state to the protocol, similar
to the optimization discussed for snooping protocols. The Owned state behaves like the
Shared state in that nodes may only read Owned blocks, but it behaves like the Modified
state in that nodes must supply data on other nodes' Get requests to Owned blocks. The
Owned state eliminates the case where a GetShared request to a block in state Modified
requires the node to send the data to both the requesting processor and the memory. In a
MOSI directory protocol, a GetShared request to a block in either the Modified or Owned
states supplies data to the requesting node and transitions to the Owned state. A GetModi-
ied request in state Owned is handled like a request in state Modified. This optimized
MOSI protocol only updates memory when a node replaces a block in state Modified or
Owned.
a. [20] <5.4> Explain why the MSA state in the protocol is essentially a “transient”
Owned state.
b. [30] <5.4> Modify the cache and directory protocol tables to support a stable Owned
state.
5.18 [25/25] <5.4> The advanced directory protocol described above relies on a point-to-point
ordered interconnect to ensure correct operation. Assuming the initial cache contents of
Figure 5.38 and the following sequences of operations, explain what problem could ariseif
if the interconnect failed to maintain point-to-point ordering. Assume that the processors
perform the requests at the same time, but they are processed by the directory in the order
shown.
a. [25] <5.4> P1,0: read 110
P3,1: write 110 <-- 90
b. [25] <5.4> P1,0: read 110
P0,0: replace 110
Exercises
5.19 [15] <5.1> Assume that we have a function for an application of the form F ( i , p ), which
gives the fraction of time that exactly i processors are usable given that a total of p pro-
cessors is available. That means that
Assume that when i processors are in use, the applications run i times faster. Rewrite Am-
dahl's law so it gives the speedup as a function of p for some application.
5.20 [15/20/10] <5.1> In this exercise, we examine the effect of the interconnection network
topology on the clock cycles per instruction (CPI) of programs running on a 64-processor
distributed-memory multiprocessor. The processor clock rate is 3.3 GHz and the base CPI
Search WWH ::




Custom Search