Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

c. [20] <5.3> P1: read 120

P1: read 128

P1: read 130

d. [20] <5.3> P1: read 100

P1: write 108 <-- 48

P1: write 130 <-- 78

FIGURE 5.36 Snooping coherence latencies .

5.3 [20] <5.2> Many snooping coherence protocols have additional states, state transitions, or

bus transactions to reduce the overhead of maintaining cache coherency. In Implementa-

tion 1 of Exercise 5.2 , misses are incurring fewer stall cycles when they are supplied by

cache than when they are supplied by memory. Some coherence protocols try to improve

performance by increasing the frequency of this case. A common protocol optimization is

to introduce an Owned state (usually denoted O). The Owned state behaves like the Shared

state in that nodes may only read Owned blocks, but it behaves like the Modified state in

that nodes must supply data on other nodes' read and write misses to Owned blocks. A

read miss to a block in either the Modified or Owned states supplies data to the requesting

node and transitions to the Owned state. A write miss to a block in either state Modified or

Owned supplies data to the requesting node and transitions to state Invalid. This optim-

ized MOSI protocol only updates memory when a node replaces a block in state Modified

or Owned. Draw new protocol diagrams with the additional state and transitions.

5.4 [20/20/20/20] <5.2> For the following code sequences and the timing parameters for the

two implementations in Figure 5.36 , compute the total stall cycles for the base MSI protocol

and the optimized MOSI protocol in Exercise 5.3 . Assume that state transitions that do not

require bus transactions incur no additional stall cycles.

a. [20] <5.2> P0: read 110

P3: read 110

P0: read 110

b. [20] <5.2> P1: read 120

P3: read 120

P0: read 120

c. [20] <5.2> P0: write 120 <-- 80

P3: read 120

P0: read 120

Search WWH ::

Custom Search

Home