Hardware Reference
In-Depth Information
Memory access is a huge bottleneck in all modern computers because CPUs
are so much faster than memory. One way to reduce memory references is to have
a large level 1 cache on chip and an even larger level 2 cache close to the chip. All
modern designs have these two caches. But one can go beyond caching to look for
other ways to reduce memory references, and the IA-64 uses some of these ways.
The best way to speed up memory references is to avoid having them in the
first place. The Itanium 2 implementation of the IA-64 model has 128 general-pur-
pose 64-bit registers. The first 32 of these are static, but the remaining 96 are used
as a register stack, very similar to the register window scheme in other RISC proc-
essors, such as the UltraSPARC. However, unlike the UltraSPARC, the number of
registers visible to the program is variable and can change from procedure to pro-
cedure. Thus each procedure has access to 32 static registers and some (variable)
number of dynamically allocated registers.
When a procedure is called, the register stack pointer is advanced so the input
parameters are visible in registers, but no registers are allocated for local variables.
The procedure itself decides how many registers it needs and advances the register
stack pointer to allocate them. These registers need not be saved on entry or restor-
ed on exit, although if the procedure needs to modify a static register it must take
care to explicitly save it first and restore it later. By making the number of regis-
ters available variable and tailored to what each procedure needs, scarce registers
are not wasted and procedure calls can go deeper before registers have to be spilled
to memory.
The Itanium 2 also has 128 floating-point registers in IEEE 745 format. They
do not operate as a register stack. This very large number of registers means that
many floating-point computations can keep all their intermediate results in regis-
ters and avoid having to store temporary results in memory.
There are also 64 1-bit predicate registers, eight branch registers, and 128 spe-
cial-purpose application registers used for various purposes, such as passing pa-
rameters between application programs and the operating system. An overview of
the Itanium 2's registers is given in Fig. 5-46.
5.8.4 Instruction Scheduling
One of the main problems in the Core i7 is the difficulty of scheduling the vari-
ous instructions over the various functional units and avoiding dependences.
Exceedingly complex mechanisms are needed to handle these issues at run time,
and a large fraction of the chip area is devoted to managing them. The IA-64 and
Itanium 2 avoid all these problems by having the compiler do the work. The key
idea is that a program consists of a sequence of instruction groups . Within cer-
tain boundaries, all the instructions within a group do not conflict with one another,
do not use more functional units and resources than the machine has, do not con-
tain RAW and WAW dependences, and have only certain restricted WAR depen-
dences. Consecutive instruction groups give the appearance of being executed
 
 
Search WWH ::




Custom Search