Memory Hierarchy Design - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

each cache size.

a. [12] <2.6> What is the system page size?

b. [15] <2.6> How many entries are there in the translation lookaside buffer (TLB) ?

c. [15] <2.6> What is the miss penalty for the TLB?

d. [20] <2.6> What is the associativity of the TLB?

2.6 [20/20] <2.6> In multiprocessor memory systems, lower levels of the memory hierarchy

may not be able to be saturated by a single processor but should be able to be saturated

by multiple processors working together. Modify the code in Figure 2.29 , and run multiple

copies at the same time. Can you determine:

a. [20] <2.6> How many actual processors are in your computer system and how many

system processors are just additional multithreaded contexts?

b. [20] <2.6> How many memory controllers does your system have?

2.7 [20] <2.6> Can you think of a way to test some of the characteristics of an instruction cache

using a program? Hint : The compiler may generate a large number of non obvious instruc-

tions from a piece of code. Try to use simple arithmetic instructions of known length in

your instruction set architecture (ISA).

Exercises

2.8 [12/12/15] <2.2> The following questions investigate the impact of small and simple caches

using CACTI and assume a 65 nm (0.065 μm) technology. (CACTI is available in an online

form at htp://quid.hpl.hp.com:9081/cacti/ . )

a. [12] <2.2> Compare the access times of 64 KB caches with 64 byte blocks and a single

bank. What are the relative access times of two-way and four-way set associative

caches in comparison to a direct mapped organization?

b. [12] <2.2> Compare the access times of four-way set associative caches with 64 byte

blocks and a single bank. What are the relative access times of 32 KB and 64 KB caches

in comparison to a 16 KB cache?

c. [15] <2.2> For a 64 KB cache, find the cache associativity between 1 and 8 with the low-

est average memory access time given that misses per instruction for a certain work-

load suite is 0.00664 for direct mapped, 0.00366 for two-way set associative, 0.000987

for four-way set associative, and 0.000266 for eight-way set associative cache. Over-

all, there are 0.3 data references per instruction. Assume cache misses take 10 ns in

all models. To calculate the hit time in cycles, assume the cycle time output using

CACTI, which corresponds to the maximum frequency a cache can operate without

any bubbles in the pipeline.

2.9 [12/15/15/10] <2.2> You are investigating the possible benefits of a way-predicting L1

cache. Assume that a 64 KB four-way set associative single-banked L1 data cache is the

cycle time limiter in a system. As an alternative cache organization you are considering a

way-predicted cache modeled as a 64 KB direct-mapped cache with 80% prediction accur-

acy. Unless stated otherwise, assume that a mispredicted way access that hits in the cache

takes one more cycle. Assume the miss rates and the miss penalties in question 2.8 part (c).

a. [12] <2.2> What is the average memory access time of the current cache (in cycles)

versus the way-predicted cache?

Search WWH ::

Custom Search

Home