Hardware Reference
In-Depth Information
branch already in the pipeline. In Section 3.6 , we will see how speculation support removes
this restriction.
Tomasulo's Algorithm: A Loop-Based Example
To understand the full power of eliminating WAW and WAR hazards through dynamic re-
naming of registers, we must look at a loop. Consider the following simple sequence for mul-
tiplying the elements of an array by a scalar in F2 :
Loop: L.D F0,0(R1)
MUL.D F4,F0,F2
S.D F4,0(R1)
DADDIU R1,R1,−8
BNE R1,R2,Loop; branches if R1|R2
If we predict that branches are taken, using reservation stations will allow multiple execu-
tions of this loop to proceed at once. This advantage is gained without changing the code—in
effect, the loop is unrolled dynamically by the hardware using the reservation stations ob-
tained by renaming to act as additional registers.
Let's assume we have issued all the instructions in two successive iterations of the loop, but
none of the floating-point load/stores or operations has completed. Figure 3.10 shows reserva-
tion stations, register status tables, and load and store buffers at this point. (The integer ALU
operation is ignored, and it is assumed the branch was predicted as taken.) Once the system
reaches this state, two copies of the loop could be sustained with a CPI close to 1.0, provided
the multiplies could complete in four clock cycles. With a latency of six cycles, additional iter-
ations will need to be processed before the steady state can be reached. This requires more re-
servation stations to hold instructions that are in execution. As we will see later in this chapter,
when extended with multiple instruction issue, Tomasulo's approach can sustain more than
one instruction per clock.
Search WWH ::




Custom Search