Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Dynamic Scheduling Using Tomasulo's Approach

The IBM 360/91 floating-point unit used a sophisticated scheme to allow out-of-order execu-

tion. This scheme, invented by Robert Tomasulo, tracks when operands for instructions are

available to minimize RAW hazards and introduces register renaming in hardware to minim-

ize WAW and WAR hazards. There are many variations on this scheme in modern processors,

although the key concepts of tracking instruction dependences to allow execution as soon as

operands are available and renaming registers to avoid WAR and WAW hazards are common

characteristics.

IBM's goal was to achieve high floating-point performance from an instruction set and from

compilers designed for the entire 360 computer family, rather than from specialized compilers

for the high-end processors. The 360 architecture had only four double-precision loating-

point registers, which limits the effectiveness of compiler scheduling; this fact was another

motivation for the Tomasulo approach. In addition, the IBM 360/91 had long memory accesses

and long floating-point delays, which Tomasulo's algorithm was designed to overcome. At the

end of the section, we will see that Tomasulo's algorithm can also support the overlapped ex-

ecution of multiple iterations of a loop.

We explain the algorithm, which focuses on the floating-point unit and load-store unit, in

the context of the MIPS instruction set. The primary difference between MIPS and the 360 is

the presence of register-memory instructions in the later architecture. Because Tomasulo's al-

gorithm uses a load functional unit, no significant changes are needed to add register-memory

addressing modes. The IBM 360/91 also had pipelined functional units, rather than multiple

functional units, but we describe the algorithm as if there were multiple functional units. It is

a simple conceptual extension to also pipeline those functional units.

As we will see, RAW hazards are avoided by executing an instruction only when its oper-

ands are available, which is exactly what the simpler scoreboarding approach provides. WAR

and WAW hazards, which arise from name dependences, are eliminated by register renam-

ing. Register renaming eliminates these hazards by renaming all destination registers, includ-

ing those with a pending read or write for an earlier instruction, so that the out-of-order write

does not affect any instructions that depend on an earlier value of an operand.

To beter understand how register renaming eliminates WAR and WAW hazards, consider

the following example code sequence that includes potential WAR and WAW hazards:

DIV.D F0,F2,F4

ADD.D F6,F0,F8

S.D F6,0(R1)

SUB.D F8,F10,F14

MUL.D F6,F10,F8

There are two antidependences: between the ADD.D and the SUB.D and between the S.D and the

MUL.D . There is also an output dependence between the ADD.D and the MUL.D , leading to three pos-

sible hazards: WAR hazards on the use of F8 by ADD.D and the use of F6 by the SUB.D , as well as

a WAW hazard since the ADD.D may finish later than the MUL.D . There are also three true data

dependences: between the DIV.D and the ADD.D , between the SUB.D and the MUL.D , and between the

ADD.D and the S.D .

These three name dependences can all be eliminated by register renaming. For simplicity,

assume the existence of two temporary registers, S and T . Using S and T , the sequence can be

rewritten without any dependences as:

DIV.D

F0,F2,F4

ADD.D

S,F0,F8

S.D

S,0(R1)

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home