Hardware Reference
In-Depth Information
floating-point instruction. If so, the appropriate input multiplexer will have to be enabled so
as to choose the forwarded data. In the exercises, you will have the opportunity to specify the
logic for the RAW and WAW hazard detection as well as for forwarding.
Multicycle FP operations also introduce problems for our exception mechanisms, which well
deal with next.
Maintaining Precise Exceptions
Another problem caused by these long-running instructions can be illustrated with the follow-
ing sequence of code:
DIV.D F0,F2,F4
ADD.D F10,F10,F8
SUB.D F12,F12,F14
This code sequence looks straightforward; there are no dependences. A problem arises,
however, because an instruction issued early may complete after an instruction issued later. In
this example, we can expect ADD.D and SUB.D to complete before the DIV.D completes. This is called
out-of-order completion and is common in pipelines with long-running operations (see Section
C.7 ) . Because hazard detection will prevent any dependence among instructions from being
violated, why is out-of-order completion a problem? Suppose that the SUB.D causes a loating-
point arithmetic exception at a point where the ADD.D has completed but the DIV.D has not. The
result will be an imprecise exception, something we are trying to avoid. It may appear that this
could be handled by leting the loating-point pipeline drain, as we do for the integer pipeline.
But the exception may be in a position where this is not possible. For example, if the DIV.D de-
cided to take a floating-point-arithmetic exception after the add completed, we could not have
a precise exception at the hardware level. In fact, because the ADD.D destroys one of its oper-
ands, we could not restore the state to what it was before the DIV.D , even with software help.
This problem arises because instructions are completing in a different order than they were
issued. There are four possible approaches to dealing with out-of-order completion. The first is
to ignore the problem and setle for imprecise exceptions. This approach was used in the 1960s
and early 1970s. It is still used in some supercomputers, where certain classes of exceptions are
not allowed or are handled by the hardware without stopping the pipeline. It is difficult to use
this approach in most processors built today because of features such as virtual memory and
the IEEE floating-point standard that essentially require precise exceptions through a combin-
ation of hardware and software. As mentioned earlier, some recent processors have solved this
problem by introducing two modes of execution: a fast, but possibly imprecise mode and a
slower, precise mode. The slower precise mode is implemented either with a mode switch or
by insertion of explicit instructions that test for FP exceptions. In either case, the amount of
overlap and reordering permited in the FP pipeline is signiicantly restricted so that efect-
ively only one FP instruction is active at a time. This solution is used in the DEC Alpha 21064
and 21164, in the IBM Power1 and Power2, and in the MIPS R8000.
A second approach is to buffer the results of an operation until all the operations that were
issued earlier are complete. Some CPUs actually use this solution, but it becomes expensive
when the difference in running times among operations is large, since the number of results to
buffer can become large. Furthermore, results from the queue must be bypassed to continue
issuing instructions while waiting for the longer instruction. This requires a large number of
comparators and a very large multiplexer.
There are two viable variations on this basic approach. The first is a history file , used in the
Cyber 180/990. The history file keeps track of the original values of registers. When an excep-
Search WWH ::




Custom Search