Hardware Reference
In-Depth Information
We can formalize this calculation by introducing c , the cache access time, m ,
the main memory access time, and h , the hit ratio , which is the fraction of all ref-
erences that can be satisfied out of the cache. In our little example of the previous
paragraph, h
=
( k
1)/ k . Some authors also define the miss ratio , which is 1
h .
With these definitions, we can calculate the mean access time as follows:
=
c
+
(1
h ) m
mean access time
As h
1, all references can be satisfied out of the cache, and the access time ap-
proaches c . On the other hand, as h
0, a memory reference is needed every
time, so the access time approaches c
m , first a time c to check the cache (unsuc-
cessfully), and then a time m to do the memory reference. On some systems, the
memory reference can be started in parallel with the cache search, so that if a cache
miss occurs, the memory cycle has already been started. However, this strategy re-
quires that the memory can be stopped in its tracks on a cache hit, making the im-
plementation more complicated.
Using the locality principle as a guide, main memories and caches are divided
up into fixed-size blocks. When talking about these blocks inside the cache, we
will often refer to them as cache lines . When a cache miss occurs, the entire cache
line is loaded from the main memory into the cache, not just the word needed. For
example, with a 64-byte line size, a reference to memory address 260 will pull the
line consisting of bytes 256 to 319 into one cache line. With a little bit of luck,
some of the other words in the cache line will be needed shortly. Operating this
way is more efficient than fetching individual words because it is faster to fetch k
words all at once than one word k times. Also, having cache entries be more than
one word means there are fewer of them, hence a smaller overhead is required.
Finally, many computers can transfer 64 or 128 bits in parallel on a single bus
cycle, even on 32-bit machines.
Cache design is an increasingly important subject for high-performance CPUs.
One issue is cache size. The bigger the cache, the better it performs, but also the
slower it is to access and the more it costs. A second issue is the size of the cache
line. A 16-KB cache can be divided up into 1024 lines of 16 bytes, 2048 lines of 8
bytes, and other combinations. A third issue is how the cache is organized, that is,
how does the cache keep track of which memory words are currently being held?
We will examine caches in detail in Chap. 4.
A fourth design issue is whether instructions and data are kept in the same
cache or different ones. Having a unified cache (instructions and data use the
same cache) is a simpler design and automatically balances instruction fetches
against data fetches. Nevertheless, the trend these days is toward a split cache ,
with instructions in one cache and data in the other. This design is also called a
Harvard architecture , the reference going all the way back to Howard Aiken's
Mark III computer, which had different memories for instructions and data. The
force driving designers in this direction is the widespread use of pipelined CPUs.
The instruction fetch unit needs to access instructions at the same time the operand
+
 
Search WWH ::




Custom Search