Memory Hierarchy Design - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

b. [15] <2.2> If all other components could operate with the faster way-predicted cache

cycle time (including the main memory), what would be the impact on performance

from using the way-predicted cache?

c. [15] <2.2> Way-predicted caches have usually been used only for instruction caches

that feed an instruction queue or buffer. Imagine that you want to try out way predic-

tion on a data cache. Assume that you have 80% prediction accuracy and that subse-

quent operations (e.g., data cache access of other instructions, dependent operations)

are issued assuming a correct way prediction. Thus, a way misprediction necessitates a

pipe flush and replay trap, which requires 15 cycles. Is the change in average memory

access time per load instruction with data cache way prediction positive or negative,

and how much is it?

d. [10] <2.2> As an alternative to way prediction, many large associative L2 caches seri-

alize tag and data access, so that only the required dataset array needs to be activated.

This saves power but increases the access time. Use CACTI's detailed Web interface

for a 0.065 μm process 1 MB four-way set associative cache with 64 byte blocks, 144

bits read out, 1 bank, only 1 read/write port, 30 bit tags, and ITRS-HP technology with

global wires. What is the ratio of the access times for serializing tag and data access in

comparison to parallel access?

2.10 [10/12] <2.2> You have been asked to investigate the relative performance of a banked

versus pipelined L1 data cache for a new microprocessor. Assume a 64 KB two-way set as-

sociative cache with 64 byte blocks. The pipelined cache would consist of three pipestages,

similar in capacity to the Alpha 21264 data cache. A banked implementation would consist

of two 32 KB two-way set associative banks. Use CACTI and assume a 65 nm (0.065 μm)

technology to answer the following questions. The cycle time output in the Web version

shows at what frequency a cache can operate without any bubbles in the pipeline.

a. [10] <2.2> What is the cycle time of the cache in comparison to its access time, and how

many pipestages will the cache take up (to two decimal places)?

b. [12] <2.2> Compare the area and total dynamic read energy per access of the pipelined

design versus the banked design. State which takes up less area and which requires

more power, and explain why that might be.

2.11 [12/15] <2.2> Consider the usage of critical word first and early restart on L2 cache

misses. Assume a 1 MB L2 cache with 64 byte blocks and a refill path that is 16 bytes wide.

Assume that the L2 can be writen with 16 bytes every 4 processor cycles, the time to re-

ceive the first 16 byte block from the memory controller is 120 cycles, each additional 16

byte block from main memory requires 16 cycles, and data can be bypassed directly into

the read port of the L2 cache. Ignore any cycles to transfer the miss request to the L2 cache

and the requested data to the L1 cache.

a. [12] <2.2> How many cycles would it take to service an L2 cache miss with and without

critical word first and early restart?

b. [15] <2.2> Do you think critical word first and early restart would be more important

for L1 caches or L2 caches, and what factors would contribute to their relative import-

ance?

2.12 [12/12] <2.2> You are designing a write buffer between a write-through L1 cache and a

write-back L2 cache. The L2 cache write data bus is 16 B wide and can perform a write to

an independent cache address every 4 processor cycles.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home