Hardware Reference
In-Depth Information
The cache index selects the tag to be tested to see if the desired block is in the cache. The size
of the index depends on cache size, block size, and set associativity. For the Opteron cache the
set associativity is set to two, and we calculate the index as follows:
Hence, the index is 9 bits wide, and the tag is 34 - 9 or 25 bits wide. Although that is the in-
dex needed to select the proper block, 64 bytes is much more than the processor wants to con-
sume at once. Hence, it makes more sense to organize the data portion of the cache memory 8
bytes wide, which is the natural data word of the 64-bit Opteron processor. Thus, in addition
to 9 bits to index the proper cache block, 3 more bits from the block offset are used to index
the proper 8 bytes. Index selection is step 2 in Figure B.5 .
After reading the two tags from the cache, they are compared to the tag portion of the block
address from the processor. This comparison is step 3 in the figure. To be sure the tag contains
valid information, the valid bit must be set or else the results of the comparison are ignored.
Assuming one tag does match, the final step is to signal the processor to load the proper
data from the cache by using the winning input from a 2:1 multiplexor. The Opteron allows 2
clock cycles for these four steps, so the instructions in the following 2 clock cycles would waitif
if they tried to use the result of the load.
Handling writes is more complicated than handling reads in the Opteron, as it is in any
cache. If the word to be writen is in the cache, the irst three steps are the same. Since the
Opteron executes out of order, only after it signals that the instruction has commited and the
cache tag comparison indicates a hit are the data writen to the cache.
So far we have assumed the common case of a cache hit. What happens on a miss? On a read
miss, the cache sends a signal to the processor telling it the data are not yet available, and 64
bytes are read from the next level of the hierarchy. The latency is 7 clock cycles to the first 8
bytes of the block, and then 2 clock cycles per 8 bytes for the rest of the block. Since the data
cache is set associative, there is a choice on which block to replace. Opteron uses LRU, which
selects the block that was referenced longest ago, so every access must update the LRU bit.
Replacing a block means updating the data, the address tag, the valid bit, and the LRU bit.
Since the Opteron uses write-back, the old data block could have been modified, and hence
it cannot simply be discarded. The Opteron keeps 1 dirty bit per block to record if the block
was writen. If the “victim” was modiied, its data and address are sent to the victim buffer.
(This structure is similar to a write buffer in other computers.) The Opteron has space for eight
victim blocks. In parallel with other cache actions, it writes victim blocks to the next level of
the hierarchy. If the victim buffer is full, the cache must waitif
A write miss is very similar to a read miss, since the Opteron allocates a block on a read or
a write miss.
We have seen how it works, but the data cache cannot supply all the memory needs of the
processor: The processor also needs instructions. Although a single cache could try to sup-
ply both, it can be a botleneck. For example, when a load or store instruction is executed, the
pipelined processor will simultaneously request both a data word and an instruction word.
Hence, a single cache would present a structural hazard for loads and stores, leading to stalls.
One simple way to conquer this problem is to divide it: One cache is dedicated to instructions
Search WWH ::




Custom Search