Hardware Reference
In-Depth Information
clock cycles in this stage, and loads still require two steps in this stage. Stores need only
have the base register available at this step, since execution for a store at this point is only
efective address calculation.
3. Write result —When the result is available, write it on the CDB (with the ROB tag sent when
the instruction issued) and from the CDB into the ROB, as well as to any reservation sta-
tions waiting for this result. Mark the reservation station as available. Special actions are
required for store instructions. If the value to be stored is available, it is writen into the
Value field of the ROB entry for the store. If the value to be stored is not available yet, the
CDB must be monitored until that value is broadcast, at which time the Value field of the
ROB entry of the store is updated. For simplicity we assume that this occurs during the
write results stage of a store; we discuss relaxing this requirement later.
4. Commit —This is the final stage of completing an instruction, after which only its result re-
mains. (Some processors call this commit phase “completion” or “graduation.”) There are
three diferent sequences of actions at commit depending on whether the commiting in-
struction is a branch with an incorrect prediction, a store, or any other instruction (normal
commit). The normal commit case occurs when an instruction reaches the head of the ROB
and its result is present in the buffer; at this point, the processor updates the register with
the result and removes the instruction from the ROB. Commiting a store is similar except
that memory is updated rather than a result register. When a branch with incorrect predic-
tion reaches the head of the ROB, it indicates that the speculation was wrong. The ROB is
lushed and execution is restarted at the correct successor of the branch. If the branch was
correctly predicted, the branch is inished.
Once an instruction commits, its entry in the ROB is reclaimed and the register or memory
destination is updated, eliminating the need for the ROB entry. If the ROB fills, we simply stop
issuing instructions until an entry is made free. Now, let's examine how this scheme would
work with the same example we used for Tomasulo's algorithm.
Example
Assume the same latencies for the floating-point functional units as in earlier
examples: add is 2 clock cycles, multiply is 6 clock cycles, and divide is 12 clock
cycles. Using the code segment below, the same one we used to generate Figure
3.8 , show what the status tables look like when the MUL.D is ready to go to com-
mit.
L.D
F6,32(R2)
L.D
F2,44(R3)
MUL.D
F0,F2,F4
SUB.D
F8,F2,F6
DIV.D
F10,F0,F6
ADD.D
F6,F8,F2
Answer
Figure 3.12 shows the result in the three tables. Notice that although the SUB.D
instruction has completed execution, it does not commit until the MUL.D commits.
The reservation stations and register status field contain the same basic inform-
ation that they did for Tomasulo's algorithm (see page 176 for a description of
those fields). The differences are that reservation station numbers are replaced
Search WWH ::




Custom Search