THE MICROARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

by stalled instructions. (These counters are not shown in the figure.) The rule for

issuing instructions now has to be extended to prevent the issue of any instruction

with an operand scheduled to be stored into by an instruction that came before it

but was skipped over.

Now let us look back at I6, I7, and I8 in Fig. 4-43. Here we see that I6 com-

putes a value in R1 that is used by I7. However, we also see that the value is never

used again because I8 overwrites R1 . There is no real reason to use R1 as the place

to hold the result of I6. Worse yet, R1 is a terrible choice of intermediate register,

although a perfectly reasonable one for a compiler or programmer used to the idea

of sequential execution with no instruction overlap.

In Fig. 4-44 we introduce a new technique for solving this problem: register

renaming . The wise decode unit changes the use of R1 in I6 (cycle 3) and I7

(cycle 4) to a secret register, S1 , not visible to the programmer. Now I6 can be

issued concurrently with I5. Modern CPUs often have dozens of secret registers

for use with register renaming. This technique can often eliminate WAR and

WAW dependences.

At I8, we use register renaming again. This time R1 is renamed into S2 so the

addition can be started before R1 is free, at the end of cycle 6. If it turns out that

the result really has to be in R1 this time, the contents of S2 can always be copied

back there just in time. Even better, all future instructions needing it can have their

sources renamed to the register where it really is stored. In any case, the I8 addi-

tion got to start earlier this way.

On many real machines, renaming is deeply embedded in the way the registers

are organized. There are many secret registers and a table that maps the registers

visible to the programmer onto the secret registers. Thus the real register being

used for, say, R0 is located by looking at entry 0 of this mapping table. In this way,

there is no real register R0 , just a binding between the name R0 and one of the

secret registers. This binding changes frequently during execution to avoid depen-

dences.

Notice in Fig. 4-44, when reading down the fourth column, that the instruc-

tions have not been issued in order. Nor they have been retired in order. The con-

clusion of this example is simple: using out-of-order execution and register rena-

ming, we were able to speed up the computation by a factor of two.

In the previous section we introduced the concept of reordering instructions in

order to improve performance. Although we did not mention it explicitly, the

focus there was on reordering instructions within a single basic block. It is now

time to look at this point more closely.

Computer programs can be broken up into basic blocks , each consisting of a

linear sequence of code with one entry point on top and one exit on the bottom. A

basic block does not contain any control structures (e.g., if statements or while

Search WWH ::

Custom Search

Home