THE INSTRUCTION SET ARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

astronomy was that the earth was fixed and motionless in space and that the planets

moved in circles with epicycles around it. However, as observations got better and

more deviations from this model could be clearly observed, epicycles were added

to the epicycles until the whole model just collapsed from its internal complexity.

Intel is in the same pickle now. A huge fraction of all the transistors on the

Core i7 are devoted to decomposing CISC instructions, figuring out what can be

done in parallel, resolving conflicts, making predictions, repairing the conse-

quences of incorrect predictions, and other bookkeeping, leaving surprisingly few

for doing the real work the user asked for. The conclusion that Intel is being inex-

orably driven to is the only sane conclusion: junk the whole thing (IA-32) and start

all over with a clean slate (IA-64). The EMT-64 provides some breathing room,

but it really papers over the complexity issue.

The key idea behind the IA-64 is moving work from run time to compile time.

On the Core i7, during execution the CPU reorders instructions, renames registers,

schedules functional units, and does a lot of other work to determine how to keep

all the hardware resources fully occupied. In the IA-64 model, the compiler fig-

ures out all these things in advance and produces a program that can be run as is,

without the hardware having to juggle everything during execution. For example,

rather than tell the compiler that the machine has eight registers when it actually

has 128 and then try to figure out at run time how to avoid dependences, in the

IA-64 model, the compiler is told how many registers the machine really has so it

can produce a program that does not have any register conflicts to start with. Simi-

larly, in this model, the compiler keeps track of which functional units are busy and

does not issue instructions that use functional units that are not available. The

model of making the underlying parallelism in the hardware visible to the compiler

is called EPIC ( Explicitly Parallel Instruction Computing ). To some extent,

EPIC can be thought of as the successor to RISC.

The IA-64 model has a number of features that speed up performance. These

include reducing memory references, instruction scheduling, reducing conditional

branches, and speculation. We will now examine each of these in turn and discuss

how they are implemented in the Itanium 2.

The Itanium 2 has a simple memory model. Memory consists of up to 2 64

bytes of linear memory. Instructions are available to access memory in units of 1,

2, 4, 8, 16, and 10 bytes, the latter for 80-bit IEEE 745 floating-point numbers.

Memory references need not be aligned on their natural boundaries, but a per-

formance penalty is incurred if they are not. Memory can be either big endian or

little endian, determined by a bit in a register loadable by the operating system.

Search WWH ::

Custom Search

Home