Hardware Reference
In-Depth Information
simple mapping that works well is to spread the addresses of the block sequentially across the
banks, called sequential interleaving . For example, if there are four banks, bank 0 has all blocks
whose address modulo 4 is 0, bank 1 has all blocks whose address modulo 4 is 1, and so on.
Figure 2.6 shows this interleaving. Multiple banks also are a way to reduce power consump-
tion both in caches and DRAM.
FIGURE 2.6 Four-way interleaved cache banks using block addressing . Assuming 64
bytes per blocks, each of these addresses would be multiplied by 64 to get byte addressing.
Sixth Optimization: Critical Word First And Early Restart To
Reduce Miss Penalty
This technique is based on the observation that the processor normally needs just one word of
the block at a time. This strategy is impatience: Don't wait for the full block to be loaded before
sending the requested word and restarting the processor. Here are two specific strategies:
Critical word first —Request the missed word first from memory and send it to the processor
as soon as it arrives; let the processor continue execution while filling the rest of the words
in the block.
Early restart —Fetch the words in normal order, but as soon as the requested word of the
block arrives send it to the processor and let the processor continue execution.
Generally, these techniques only benefit designs with large cache blocks, since the benefit
is low unless blocks are large. Note that caches normally continue to satisfy accesses to other
blocks while the rest of the block is being illed.
Alas, given spatial locality, there is a good chance that the next reference is to the rest of the
block. Just as with nonblocking caches, the miss penalty is not simple to calculate. When there
is a second request in critical word first, the effective miss penalty is the nonoverlapped time
from the reference until the second piece arrives. The benefits of critical word first and early
restart depend on the size of the block and the likelihood of another access to the portion of
the block that has not yet been fetched.
Seventh Optimization: Merging Write Buffer To Reduce Miss
Penalty
Write-through caches rely on write buffers, as all stores must be sent to the next lower level
of the hierarchy. Even write-back caches use a simple buffer when a block is replaced. If the
write buffer is empty, the data and the full address are writen in the bufer, and the write
is finished from the processor's perspective; the processor continues working while the write
buffer prepares to write the word to memory. If the buffer contains other modified blocks, the
addresses can be checked to see if the address of the new data matches the address of a valid
 
Search WWH ::




Custom Search