Hardware Reference
In-Depth Information
astronomy was that the earth was fixed and motionless in space and that the planets
moved in circles with epicycles around it. However, as observations got better and
more deviations from this model could be clearly observed, epicycles were added
to the epicycles until the whole model just collapsed from its internal complexity.
Intel is in the same pickle now. A huge fraction of all the transistors on the
Core i7 are devoted to decomposing CISC instructions, figuring out what can be
done in parallel, resolving conflicts, making predictions, repairing the conse-
quences of incorrect predictions, and other bookkeeping, leaving surprisingly few
for doing the real work the user asked for. The conclusion that Intel is being inex-
orably driven to is the only sane conclusion: junk the whole thing (IA-32) and start
all over with a clean slate (IA-64). The EMT-64 provides some breathing room,
but it really papers over the complexity issue.
5.8.2 The IA-64 Model: Explicitly Parallel Instruction Computing
The key idea behind the IA-64 is moving work from run time to compile time.
On the Core i7, during execution the CPU reorders instructions, renames registers,
schedules functional units, and does a lot of other work to determine how to keep
all the hardware resources fully occupied. In the IA-64 model, the compiler fig-
ures out all these things in advance and produces a program that can be run as is,
without the hardware having to juggle everything during execution. For example,
rather than tell the compiler that the machine has eight registers when it actually
has 128 and then try to figure out at run time how to avoid dependences, in the
IA-64 model, the compiler is told how many registers the machine really has so it
can produce a program that does not have any register conflicts to start with. Simi-
larly, in this model, the compiler keeps track of which functional units are busy and
does not issue instructions that use functional units that are not available. The
model of making the underlying parallelism in the hardware visible to the compiler
is called EPIC ( Explicitly Parallel Instruction Computing ). To some extent,
EPIC can be thought of as the successor to RISC.
The IA-64 model has a number of features that speed up performance. These
include reducing memory references, instruction scheduling, reducing conditional
branches, and speculation. We will now examine each of these in turn and discuss
how they are implemented in the Itanium 2.
5.8.3 Reducing Memory References
The Itanium 2 has a simple memory model. Memory consists of up to 2 64
bytes of linear memory. Instructions are available to access memory in units of 1,
2, 4, 8, 16, and 10 bytes, the latter for 80-bit IEEE 745 floating-point numbers.
Memory references need not be aligned on their natural boundaries, but a per-
formance penalty is incurred if they are not. Memory can be either big endian or
little endian, determined by a bit in a register loadable by the operating system.
 
 
 
Search WWH ::




Custom Search