COMPUTER SYSTEMS ORGANIZATION - Structured Computer Organization

Hardware Reference

In-Depth Information

We can formalize this calculation by introducing c , the cache access time, m ,

the main memory access time, and h , the hit ratio , which is the fraction of all ref-

erences that can be satisfied out of the cache. In our little example of the previous

paragraph, h

=

( k

−

1)/ k . Some authors also define the miss ratio , which is 1

−

h .

With these definitions, we can calculate the mean access time as follows:

=

c

+

(1

−

h ) m

mean access time

As h

1, all references can be satisfied out of the cache, and the access time ap-

proaches c . On the other hand, as h

→

0, a memory reference is needed every

time, so the access time approaches c

m , first a time c to check the cache (unsuc-

cessfully), and then a time m to do the memory reference. On some systems, the

memory reference can be started in parallel with the cache search, so that if a cache

miss occurs, the memory cycle has already been started. However, this strategy re-

quires that the memory can be stopped in its tracks on a cache hit, making the im-

plementation more complicated.

Using the locality principle as a guide, main memories and caches are divided

up into fixed-size blocks. When talking about these blocks inside the cache, we

will often refer to them as cache lines . When a cache miss occurs, the entire cache

line is loaded from the main memory into the cache, not just the word needed. For

example, with a 64-byte line size, a reference to memory address 260 will pull the

line consisting of bytes 256 to 319 into one cache line. With a little bit of luck,

some of the other words in the cache line will be needed shortly. Operating this

way is more efficient than fetching individual words because it is faster to fetch k

words all at once than one word k times. Also, having cache entries be more than

one word means there are fewer of them, hence a smaller overhead is required.

Finally, many computers can transfer 64 or 128 bits in parallel on a single bus

cycle, even on 32-bit machines.

Cache design is an increasingly important subject for high-performance CPUs.

One issue is cache size. The bigger the cache, the better it performs, but also the

slower it is to access and the more it costs. A second issue is the size of the cache

line. A 16-KB cache can be divided up into 1024 lines of 16 bytes, 2048 lines of 8

bytes, and other combinations. A third issue is how the cache is organized, that is,

how does the cache keep track of which memory words are currently being held?

We will examine caches in detail in Chap. 4.

A fourth design issue is whether instructions and data are kept in the same

cache or different ones. Having a unified cache (instructions and data use the

same cache) is a simpler design and automatically balances instruction fetches

against data fetches. Nevertheless, the trend these days is toward a split cache ,

with instructions in one cache and data in the other. This design is also called a

Harvard architecture , the reference going all the way back to Howard Aiken's

Mark III computer, which had different memories for instructions and data. The

force driving designers in this direction is the widespread use of pipelined CPUs.

The instruction fetch unit needs to access instructions at the same time the operand

+

Structured Computer Organization

Search WWH ::

Custom Search

Home