Hardware Reference
In-Depth Information
SUB.D T,F10,F14
MUL.D F6,F10,T
In addition, any subsequent uses of F8 must be replaced by the register T . In this code seg-
ment, the renaming process can be done statically by the compiler. Finding any uses of F8 that
are later in the code requires either sophisticated compiler analysis or hardware support, since
there may be intervening branches between the above code segment and a later use of F8 . As
we will see, Tomasulo's algorithm can handle renaming across branches.
In Tomasulo's scheme, register renaming is provided by reservation stations , which buffer the
operands of instructions waiting to issue. The basic idea is that a reservation station fetches
and buffers an operand as soon as it is available, eliminating the need to get the operand from
a register. In addition, pending instructions designate the reservation station that will provide
their input. Finally, when successive writes to a register overlap in execution, only the last
one is actually used to update the register. As instructions are issued, the register speciiers
for pending operands are renamed to the names of the reservation station, which provides re-
gister renaming.
Since there can be more reservation stations than real registers, the technique can even elim-
inate hazards arising from name dependences that could not be eliminated by a compiler. As
we explore the components of Tomasulo's scheme, we will return to the topic of register re-
naming and see exactly how the renaming occurs and how it eliminates WAR and WAW haz-
ards.
The use of reservation stations, rather than a centralized register file, leads to two other im-
portant properties. First, hazard detection and execution control are distributed: The inform-
ation held in the reservation stations at each functional unit determines when an instruction
can begin execution at that unit. Second, results are passed directly to functional units from
the reservation stations where they are buffered, rather than going through the registers. This
bypassing is done with a common result bus that allows all units waiting for an operand to be
loaded simultaneously (on the 360/91 this is called the common data bus , or CDB). In pipelines
with multiple execution units and issuing multiple instructions per clock, more than one result
bus will be needed.
Figure 3.6 shows the basic structure of a Tomasulo-based processor, including both the
loating-point unit and the load/store unit; none of the execution control tables is shown. Each
reservation station holds an instruction that has been issued and is awaiting execution at a
functional unit and either the operand values for that instruction, if they have already been
computed, or else the names of the reservation stations that will provide the operand values.
Search WWH ::




Custom Search