Hardware Reference
In-Depth Information
b. [15] <2.2> If all other components could operate with the faster way-predicted cache
cycle time (including the main memory), what would be the impact on performance
from using the way-predicted cache?
c. [15] <2.2> Way-predicted caches have usually been used only for instruction caches
that feed an instruction queue or buffer. Imagine that you want to try out way predic-
tion on a data cache. Assume that you have 80% prediction accuracy and that subse-
quent operations (e.g., data cache access of other instructions, dependent operations)
are issued assuming a correct way prediction. Thus, a way misprediction necessitates a
pipe flush and replay trap, which requires 15 cycles. Is the change in average memory
access time per load instruction with data cache way prediction positive or negative,
and how much is it?
d. [10] <2.2> As an alternative to way prediction, many large associative L2 caches seri-
alize tag and data access, so that only the required dataset array needs to be activated.
This saves power but increases the access time. Use CACTI's detailed Web interface
for a 0.065 μm process 1 MB four-way set associative cache with 64 byte blocks, 144
bits read out, 1 bank, only 1 read/write port, 30 bit tags, and ITRS-HP technology with
global wires. What is the ratio of the access times for serializing tag and data access in
comparison to parallel access?
2.10 [10/12] <2.2> You have been asked to investigate the relative performance of a banked
versus pipelined L1 data cache for a new microprocessor. Assume a 64 KB two-way set as-
sociative cache with 64 byte blocks. The pipelined cache would consist of three pipestages,
similar in capacity to the Alpha 21264 data cache. A banked implementation would consist
of two 32 KB two-way set associative banks. Use CACTI and assume a 65 nm (0.065 μm)
technology to answer the following questions. The cycle time output in the Web version
shows at what frequency a cache can operate without any bubbles in the pipeline.
a. [10] <2.2> What is the cycle time of the cache in comparison to its access time, and how
many pipestages will the cache take up (to two decimal places)?
b. [12] <2.2> Compare the area and total dynamic read energy per access of the pipelined
design versus the banked design. State which takes up less area and which requires
more power, and explain why that might be.
2.11 [12/15] <2.2> Consider the usage of critical word first and early restart on L2 cache
misses. Assume a 1 MB L2 cache with 64 byte blocks and a refill path that is 16 bytes wide.
Assume that the L2 can be writen with 16 bytes every 4 processor cycles, the time to re-
ceive the first 16 byte block from the memory controller is 120 cycles, each additional 16
byte block from main memory requires 16 cycles, and data can be bypassed directly into
the read port of the L2 cache. Ignore any cycles to transfer the miss request to the L2 cache
and the requested data to the L1 cache.
a. [12] <2.2> How many cycles would it take to service an L2 cache miss with and without
critical word first and early restart?
b. [15] <2.2> Do you think critical word first and early restart would be more important
for L1 caches or L2 caches, and what factors would contribute to their relative import-
ance?
2.12 [12/12] <2.2> You are designing a write buffer between a write-through L1 cache and a
write-back L2 cache. The L2 cache write data bus is 16 B wide and can perform a write to
an independent cache address every 4 processor cycles.
Search WWH ::




Custom Search