THE INSTRUCTION SET ARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

IA-32 is an ancient ISA with all the wrong properties for current technology. It is

a CISC ISA with variable-length instructions and a myriad of different formats that

are hard to decode quickly on the fly. Current technology works best with RISC

ISAs that have one instruction length and a fixed-length opcode that is easy to

decode. The IA-32 instructions can be broken up into RISC-like micro-operations

at execution time, but doing so requires hardware (chip area), takes time, and adds

complexity to the design. That is strike one.

The IA-32 is also a two-address memory-oriented ISA. Most instructions ref-

erence memory, and most programmers and compilers think nothing of referencing

memory all the time. Current technology favors load/store ISAs that reference

memory only to get the operands into registers but otherwise perform all their cal-

culations using three-address memory register instructions. And with CPU clock

speeds going up much faster than memory speeds, the problem will get worse with

time. That is strike two.

The IA-32 also has a small and irregular register set. Not only does this tie

compilers in knots, but the small number of general-purpose registers (four or six,

depending on how you count ESI and EDI ) requires intermediate results to be

spilled into memory all the time, generating extra memory references even when

they are not logically needed. That is strike three. The IA-32 is out.

Now let us start the second inning. The small number of registers causes many

dependences, especially unnecessary WAR dependences, because results have to

go somewhere and no extra registers are available. Getting around the lack of reg-

isters requires the implementation to do renaming internally—a terrible hack if

ever there was one—to secret registers inside the reorder buffer. To avoid blocking

on cache misses too often, instructions have to be executed out of order. However,

the IA-32's semantics specify precise interrupts, so the out-of-order instructions

have to be retired in order. All of these things require a lot of very complex hard-

ware. Strike four.

Doing all this work quickly requires a deep pipeline. In turn, the deep pipeline

means that instructions entered into it take many cycles before they are finished.

Consequently, very accurate branch prediction is essential to make sure the right

instructions are being entered into the pipeline. Because a misprediction requires

the pipeline to be flushed, at great cost, even a fairly low misprediction rate can

cause a substantial performance degradation. Strike five.

To alleviate the problems with mispredictions, the processor has to do specula-

tive execution, with all the problems it entails, especially when memory references

on the wrong path cause an exception. Strike six.

We are not going to play the whole baseball game here, but it should be clear

by now that there is a problem. And we have not even mentioned that IA-32's

32-bit addresses limit individual programs to 4 GB of memory, which is a big

problem on servers. The EMT-64 solves this problem but not all the others.

All in all, the situation with IA-32 can be favorably compared to the state of

celestial mechanics just prior to Copernicus. The then-current theory dominating

Structured Computer Organization

Search WWH ::

Custom Search

Home