Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

of Figure 3.8 on page 179, which shows the same code sequence in operation on a processor

with Tomasulo's algorithm. The key difference is that, in the example above, no instruction

after the earliest uncompleted instruction ( MUL.D above) is allowed to complete. In contrast, in

Figure 3.8 the SUB.D and ADD.D instructions have also completed.

One implication of this difference is that the processor with the ROB can dynamically ex-

ecute code while maintaining a precise interrupt model. For example, if the MUL.D instruction

caused an interrupt, we could simply wait until it reached the head of the ROB and take the

interrupt, flushing any other pending instructions from the ROB. Because instruction commit

happens in order, this yields a precise exception.

By contrast, in the example using Tomasulo's algorithm, the SUB.D and ADD.D instructions

could both complete before the MUL.D raised the exception. The result is that the registers F8

and F6 (destinations of the SUB.D and ADD.D instructions) could be overwriten, and the interrupt

would be imprecise.

Some users and architects have decided that imprecise floating-point exceptions are accept-

able in high-performance processors, since the program will likely terminate; see Appendix J

for further discussion of this topic. Other types of exceptions, such as page faults, are much

more difficult to accommodate if they are imprecise, since the program must transparently re-

sume execution after handling such an exception.

The use of a ROB with in-order instruction commit provides precise exceptions, in addition

to supporting speculative execution, as the next example shows.

Example

Consider the code example used earlier for Tomasulo's algorithm and shown in

Figure 3.10 in execution:

Loop: L.D F0,0(R1)

MUL.D F4,F0,F2

S.D F4,0(R1)

DADDIU R1,R1,#−8

BNE R1,R2,Loop ;branches if R1|R2

Assume that we have issued all the instructions in the loop twice. Let's also

assume that the L.D and MUL.D from the irst iteration have commited and all oth-

er instructions have completed execution. Normally, the store would wait in the

ROB for both the effective address operand ( R1 in this example) and the value

( F4 in this example). Since we are only considering the floating-point pipeline,

assume the effective address for the store is computed by the time the instruc-

tion is issued.

Answer

Figure 3.13 shows the result in two tables.

Search WWH ::

Custom Search

Home