Hardware Reference
In-Depth Information
B.5 [10/10/10/10/] <B.2> You are building a system around a processor with in-order execution
that runs at 1.1 GHz and has a CPI of 0.7 excluding memory accesses. The only instructions
that read or write data from memory are loads (20% of all instructions) and stores (5% of
all instructions). The memory system for this computer is composed of a split L1 cache
that imposes no penalty on hits. Both the I-cache and D-cache are direct mapped and hold
32 KB each. The I-cache has a 2% miss rate and 32-byte blocks, and the D-cache is write-
through with a 5% miss rate and 16-byte blocks. There is a write buffer on the D-cache that
eliminates stalls for 95% of all writes. The 512 KB write-back, unified L2 cache has 64-byte
blocks and an access time of 15 ns. It is connected to the L1 cache by a 128-bit data bus that
runs at 266 MHz and can transfer one 128-bit word per bus cycle. Of all memory references
sent to the L2 cache in this system, 80% are satisfied without going to main memory. Also,
50% of all blocks replaced are dirty. The 128-bit-wide main memory has an access latency
of 60 ns, after which any number of bus words may be transferred at the rate of one per
cycle on the 128-bit-wide 133 MHz main memory bus.
a. [10] <B.2> What is the average memory access time for instruction accesses?
b. [10] <B.2> What is the average memory access time for data reads?
c. [10] <B.2> What is the average memory access time for data writes?
d. [10] <B.2> What is the overall CPI, including memory accesses?
B.6 [10/15/15] <B.2> Converting miss rate (misses per reference) into misses per instruction
relies upon two factors: references per instruction fetched and the fraction of fetched in-
structions that actually commits.
a. [10] <B.2> The formula for misses per instruction on page B-5 is writen irst in terms of
three factors: miss rate, memory accesses, and instruction count. Each of these factors
represents actual events. What is different about writing misses per instruction as miss
rate times the factor memory accesses per instruction?
b. [15] <B.2> Speculative processors will fetch instructions that do not commit. The for-
mula for misses per instruction on page B-5 refers to misses per instruction on the ex-
ecution path, that is, only the instructions that must actually be executed to carry out
the program. Convert the formula for misses per instruction on page B-5 into one that
uses only miss rate, references per instruction fetched, and fraction of fetched instruc-
tions that commit. Why rely upon these factors rather than those in the formula on
page B-5?
c. [15] <B.2> The conversion in part (b) could yield an incorrect value to the extent that
the value of the factor references per instruction fetched is not equal to the number of
references for any particular instruction. Rewrite the formula of part (b) to correct this
deiciency.
B.7 [20] <B.1, B.3> In systems with a write-through L1 cache backed by a write-back L2 cache
instead of main memory, a merging write buffer can be simplified. Explain how this can be
done. Are there situations where having a full write buffer (instead of the simple version
you've just proposed) could be helpful?
B.8 [20/20/15/25] <B.3> The LRU replacement policy is based on the assumption that if ad-
dress A1 is accessed less recently than address A2 in the past, then A2 will be accessed
again before A1 in the future. Hence, A2 is given priority over A1. Discuss how this as-
sumption fails to hold when the a loop larger than the instruction cache is being continu-
ously executed. For example, consider a fully associative 128-byte instruction cache with a
4-byte block (every block can exactly hold one instruction). The cache uses an LRU replace-
ment policy.
Search WWH ::




Custom Search