Hardware Reference
In-Depth Information
traditionally wait for reads to complete but need not wait for writes. Amdahl's law ( Section
1.9 ) reminds us, however, that high-performance designs cannot neglect the speed of writes.
Fortunately, the common case is also the easy case to make fast. The block can be read from
the cache at the same time that the tag is read and compared, so the block read begins as soon
as the block address is available. If the read is a hit, the requested part of the block is passed
on to the processor immediately. If it is a miss, there is no benefit—but also no harm except
more power in desktop and server computers; just ignore the value read.
Such optimism is not allowed for writes. Modifying a block cannot begin until the tag is
checked to see if the address is a hit. Because tag checking cannot occur in parallel, writes nor-
mally take longer than reads. Another complexity is that the processor also specifies the size
of the write, usually between 1 and 8 bytes; only that portion of a block can be changed. In
contrast, reads can access more bytes than necessary without fear.
The write policies often distinguish cache designs. There are two basic options when writing
to the cache:
Write-through —The information is writen to both the block in the cache and to the block in
the lower-level memory.
Write-back —The information is writen only to the block in the cache. The modiied cache
block is writen to main memory only when it is replaced.
To reduce the frequency of writing back blocks on replacement, a feature called the dirty
bit is commonly used. This status bit indicates whether the block is dirty (modified while in
the cache) or clean (not modiied). If it is clean, the block is not writen back on a miss, since
identical information to the cache is found in lower levels.
Both write-back and write-through have their advantages. With write-back, writes occur at
the speed of the cache memory, and multiple writes within a block require only one write to
the lower-level memory. Since some writes don't go to memory, write-back uses less memory
bandwidth, making write-back atractive in multiprocessors. Since write-back uses the rest of
the memory hierarchy and memory interconnect less than write-through, it also saves power,
making it atractive for embedded applications.
Write-through is easier to implement than write-back. The cache is always clean, so unlike
write-back read misses never result in writes to the lower level. Write-through also has the ad-
vantage that the next lower level has the most current copy of the data, which simplifies data
coherency. Data coherency is important for multiprocessors and for I/O, which we examine in
Chapter 4 and Appendix D. Multilevel caches make write-through more viable for the upper-
level caches, as the writes need only propagate to the next lower level rather than all the way
to main memory.
As we will see, I/O and multiprocessors are fickle: They want write-back for processor
caches to reduce the memory traffic and write-through to keep the cache consistent with lower
levels of the memory hierarchy.
When the processor must wait for writes to complete during write-through, the processor is
said to write stall . A common optimization to reduce write stalls is a write buffer, , which allows
the processor to continue as soon as the data are writen to the bufer, thereby overlapping
processor execution with memory updating. As we will see shortly, write stalls can occur even
with write bufers.
Since the data are not needed on a write, there are two options on a write miss:
Write allocate —The block is allocated on a write miss, followed by the write hit actions
above. In this natural option, write misses act like read misses.
No-write allocate —This apparently unusual alternative is write misses do not affect the
cache. Instead, the block is modified only in the lower-level memory.
Search WWH ::




Custom Search