THE MICROARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

not the only way. The performance gains from the 80386 through the 80486, Pen-

tium, and later designs like the Core i7 are due to better implementations, as the ar-

chitecture has remained essentially the same through all of them.

Some kinds of improvements can be made only by changing the architecture.

Sometimes these changes are incremental, such as adding new instructions or reg-

isters, so that old programs will continue to run on the new models. In this case, to

get the full performance, the software must be changed, or at least recompiled with

a new compiler that takes advantage of the new features.

However, once in a few decades, designers realize that the old architecture has

outlived its usefulness and that the only way to make progress is start all over

again. The RISC revolution in the 1980s was one such breakthrough; another one

is in the air now. We will look at one example (the Intel IA-64) in Chap. 5.

In the rest of this section we will look at four different techniques for im-

proving CPU performance. We will start with three well-established implemen-

tation improvements and then move on to one that needs a little architectural sup-

port to work best. These techniques are cache memory, branch prediction, out-of-

order execution with register renaming, and speculative execution.

4.5.1 Cache Memory

One of the most challenging aspects of computer design throughout history has

been to provide a memory system able to provide operands to the processor at the

speed it can process them. The recent high rate of growth in processor speed has

not been accompanied by a corresponding speedup in memories. Relative to

CPUs, memories have been getting slower for decades. Given the enormous

importance of primary memory, this situation has greatly limited the development

of high-performance systems and has stimulated research on ways to get around

the problem of memory speeds that are much slower than CPU speeds and, rel-

atively speaking, getting worse every year.

Modern processors place overwhelming demands on a memory system, in

terms of both latency (the delay in supplying an operand) and bandwidth (the

amount of data supplied per unit of time). Unfortunately, these two aspects of a

memory system are largely at odds. Many techniques for increasing bandwidth do

so only by increasing latency. For example, the pipelining techniques used in the

Mic-3 can be applied to a memory system, with multiple, overlapping memory re-

quests handled efficiently. Unfortunately, as with the Mic-3, this results in greater

latency for individual memory operations. As processor clock speeds get faster, it

becomes more and more difficult to provide a memory system capable of supply-

ing operands in one or two clock cycles.

One way to attack this problem is by providing caches. As we saw in Sec.

2.2.5, a cache holds the most recently used memory words in a small, fast memory,

speeding up access to them. If a large enough percentage of the memory words

needed are in the cache, the effective memory latency can be reduced enormously.

Search WWH ::

Custom Search

Home