Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

floating-point instruction. If so, the appropriate input multiplexer will have to be enabled so

as to choose the forwarded data. In the exercises, you will have the opportunity to specify the

logic for the RAW and WAW hazard detection as well as for forwarding.

Multicycle FP operations also introduce problems for our exception mechanisms, which well

deal with next.

Maintaining Precise Exceptions

Another problem caused by these long-running instructions can be illustrated with the follow-

ing sequence of code:

DIV.D F0,F2,F4

ADD.D F10,F10,F8

SUB.D F12,F12,F14

This code sequence looks straightforward; there are no dependences. A problem arises,

however, because an instruction issued early may complete after an instruction issued later. In

this example, we can expect ADD.D and SUB.D to complete before the DIV.D completes. This is called

out-of-order completion and is common in pipelines with long-running operations (see Section

C.7 ) . Because hazard detection will prevent any dependence among instructions from being

violated, why is out-of-order completion a problem? Suppose that the SUB.D causes a loating-

point arithmetic exception at a point where the ADD.D has completed but the DIV.D has not. The

result will be an imprecise exception, something we are trying to avoid. It may appear that this

could be handled by leting the loating-point pipeline drain, as we do for the integer pipeline.

But the exception may be in a position where this is not possible. For example, if the DIV.D de-

cided to take a floating-point-arithmetic exception after the add completed, we could not have

a precise exception at the hardware level. In fact, because the ADD.D destroys one of its oper-

ands, we could not restore the state to what it was before the DIV.D , even with software help.

This problem arises because instructions are completing in a different order than they were

issued. There are four possible approaches to dealing with out-of-order completion. The first is

to ignore the problem and setle for imprecise exceptions. This approach was used in the 1960s

and early 1970s. It is still used in some supercomputers, where certain classes of exceptions are

not allowed or are handled by the hardware without stopping the pipeline. It is difficult to use

this approach in most processors built today because of features such as virtual memory and

the IEEE floating-point standard that essentially require precise exceptions through a combin-

ation of hardware and software. As mentioned earlier, some recent processors have solved this

problem by introducing two modes of execution: a fast, but possibly imprecise mode and a

slower, precise mode. The slower precise mode is implemented either with a mode switch or

by insertion of explicit instructions that test for FP exceptions. In either case, the amount of

overlap and reordering permited in the FP pipeline is signiicantly restricted so that efect-

ively only one FP instruction is active at a time. This solution is used in the DEC Alpha 21064

and 21164, in the IBM Power1 and Power2, and in the MIPS R8000.

A second approach is to buffer the results of an operation until all the operations that were

issued earlier are complete. Some CPUs actually use this solution, but it becomes expensive

when the difference in running times among operations is large, since the number of results to

buffer can become large. Furthermore, results from the queue must be bypassed to continue

issuing instructions while waiting for the longer instruction. This requires a large number of

comparators and a very large multiplexer.

There are two viable variations on this basic approach. The first is a history file , used in the

Cyber 180/990. The history file keeps track of the original values of registers. When an excep-

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home