Hardware Reference
In-Depth Information
by stalled instructions. (These counters are not shown in the figure.) The rule for
issuing instructions now has to be extended to prevent the issue of any instruction
with an operand scheduled to be stored into by an instruction that came before it
but was skipped over.
Now let us look back at I6, I7, and I8 in Fig. 4-43. Here we see that I6 com-
putes a value in R1 that is used by I7. However, we also see that the value is never
used again because I8 overwrites R1 . There is no real reason to use R1 as the place
to hold the result of I6. Worse yet, R1 is a terrible choice of intermediate register,
although a perfectly reasonable one for a compiler or programmer used to the idea
of sequential execution with no instruction overlap.
In Fig. 4-44 we introduce a new technique for solving this problem: register
renaming . The wise decode unit changes the use of R1 in I6 (cycle 3) and I7
(cycle 4) to a secret register, S1 , not visible to the programmer. Now I6 can be
issued concurrently with I5. Modern CPUs often have dozens of secret registers
for use with register renaming. This technique can often eliminate WAR and
WAW dependences.
At I8, we use register renaming again. This time R1 is renamed into S2 so the
addition can be started before R1 is free, at the end of cycle 6. If it turns out that
the result really has to be in R1 this time, the contents of S2 can always be copied
back there just in time. Even better, all future instructions needing it can have their
sources renamed to the register where it really is stored. In any case, the I8 addi-
tion got to start earlier this way.
On many real machines, renaming is deeply embedded in the way the registers
are organized. There are many secret registers and a table that maps the registers
visible to the programmer onto the secret registers. Thus the real register being
used for, say, R0 is located by looking at entry 0 of this mapping table. In this way,
there is no real register R0 , just a binding between the name R0 and one of the
secret registers. This binding changes frequently during execution to avoid depen-
dences.
Notice in Fig. 4-44, when reading down the fourth column, that the instruc-
tions have not been issued in order. Nor they have been retired in order. The con-
clusion of this example is simple: using out-of-order execution and register rena-
ming, we were able to speed up the computation by a factor of two.
4.5.4 Speculative Execution
In the previous section we introduced the concept of reordering instructions in
order to improve performance. Although we did not mention it explicitly, the
focus there was on reordering instructions within a single basic block. It is now
time to look at this point more closely.
Computer programs can be broken up into basic blocks , each consisting of a
linear sequence of code with one entry point on top and one exit on the bottom. A
basic block does not contain any control structures (e.g., if statements or while
 
 
Search WWH ::




Custom Search