Review of Memory Hierarchy - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

B.5 [10/10/10/10/] <B.2> You are building a system around a processor with in-order execution

that runs at 1.1 GHz and has a CPI of 0.7 excluding memory accesses. The only instructions

that read or write data from memory are loads (20% of all instructions) and stores (5% of

all instructions). The memory system for this computer is composed of a split L1 cache

that imposes no penalty on hits. Both the I-cache and D-cache are direct mapped and hold

32 KB each. The I-cache has a 2% miss rate and 32-byte blocks, and the D-cache is write-

through with a 5% miss rate and 16-byte blocks. There is a write buffer on the D-cache that

eliminates stalls for 95% of all writes. The 512 KB write-back, unified L2 cache has 64-byte

blocks and an access time of 15 ns. It is connected to the L1 cache by a 128-bit data bus that

runs at 266 MHz and can transfer one 128-bit word per bus cycle. Of all memory references

sent to the L2 cache in this system, 80% are satisfied without going to main memory. Also,

50% of all blocks replaced are dirty. The 128-bit-wide main memory has an access latency

of 60 ns, after which any number of bus words may be transferred at the rate of one per

cycle on the 128-bit-wide 133 MHz main memory bus.

a. [10] <B.2> What is the average memory access time for instruction accesses?

b. [10] <B.2> What is the average memory access time for data reads?

c. [10] <B.2> What is the average memory access time for data writes?

d. [10] <B.2> What is the overall CPI, including memory accesses?

B.6 [10/15/15] <B.2> Converting miss rate (misses per reference) into misses per instruction

relies upon two factors: references per instruction fetched and the fraction of fetched in-

structions that actually commits.

a. [10] <B.2> The formula for misses per instruction on page B-5 is writen irst in terms of

three factors: miss rate, memory accesses, and instruction count. Each of these factors

represents actual events. What is different about writing misses per instruction as miss

rate times the factor memory accesses per instruction?

b. [15] <B.2> Speculative processors will fetch instructions that do not commit. The for-

mula for misses per instruction on page B-5 refers to misses per instruction on the ex-

ecution path, that is, only the instructions that must actually be executed to carry out

the program. Convert the formula for misses per instruction on page B-5 into one that

uses only miss rate, references per instruction fetched, and fraction of fetched instruc-

tions that commit. Why rely upon these factors rather than those in the formula on

page B-5?

c. [15] <B.2> The conversion in part (b) could yield an incorrect value to the extent that

the value of the factor references per instruction fetched is not equal to the number of

references for any particular instruction. Rewrite the formula of part (b) to correct this

deiciency.

B.7 [20] <B.1, B.3> In systems with a write-through L1 cache backed by a write-back L2 cache

instead of main memory, a merging write buffer can be simplified. Explain how this can be

done. Are there situations where having a full write buffer (instead of the simple version

you've just proposed) could be helpful?

B.8 [20/20/15/25] <B.3> The LRU replacement policy is based on the assumption that if ad-

dress A1 is accessed less recently than address A2 in the past, then A2 will be accessed

again before A1 in the future. Hence, A2 is given priority over A1. Discuss how this as-

sumption fails to hold when the a loop larger than the instruction cache is being continu-

ously executed. For example, consider a fully associative 128-byte instruction cache with a

4-byte block (every block can exactly hold one instruction). The cache uses an LRU replace-

ment policy.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home