Hardware Reference
In-Depth Information
of Figure 3.8 on page 179, which shows the same code sequence in operation on a processor
with Tomasulo's algorithm. The key difference is that, in the example above, no instruction
after the earliest uncompleted instruction ( MUL.D above) is allowed to complete. In contrast, in
Figure 3.8 the SUB.D and ADD.D instructions have also completed.
One implication of this difference is that the processor with the ROB can dynamically ex-
ecute code while maintaining a precise interrupt model. For example, if the MUL.D instruction
caused an interrupt, we could simply wait until it reached the head of the ROB and take the
interrupt, flushing any other pending instructions from the ROB. Because instruction commit
happens in order, this yields a precise exception.
By contrast, in the example using Tomasulo's algorithm, the SUB.D and ADD.D instructions
could both complete before the MUL.D raised the exception. The result is that the registers F8
and F6 (destinations of the SUB.D and ADD.D instructions) could be overwriten, and the interrupt
would be imprecise.
Some users and architects have decided that imprecise floating-point exceptions are accept-
able in high-performance processors, since the program will likely terminate; see Appendix J
for further discussion of this topic. Other types of exceptions, such as page faults, are much
more difficult to accommodate if they are imprecise, since the program must transparently re-
sume execution after handling such an exception.
The use of a ROB with in-order instruction commit provides precise exceptions, in addition
to supporting speculative execution, as the next example shows.
Example
Consider the code example used earlier for Tomasulo's algorithm and shown in
Figure 3.10 in execution:
Loop: L.D F0,0(R1)
MUL.D F4,F0,F2
S.D F4,0(R1)
DADDIU R1,R1,#−8
BNE R1,R2,Loop ;branches if R1|R2
Assume that we have issued all the instructions in the loop twice. Let's also
assume that the L.D and MUL.D from the irst iteration have commited and all oth-
er instructions have completed execution. Normally, the store would wait in the
ROB for both the effective address operand ( R1 in this example) and the value
( F4 in this example). Since we are only considering the floating-point pipeline,
assume the effective address for the store is computed by the time the instruc-
tion is issued.
Answer
Figure 3.13 shows the result in two tables.
Search WWH ::




Custom Search