Hardware Reference
In-Depth Information
tion occurs and the state must be rolled back earlier than some instruction that completed out
of order, the original value of the register can be restored from the history file. A similar tech-
nique is used for autoincrement and autodecrement addressing on processors such as VAXes.
Another approach, the future file , proposed by Smith and Pleszkun [1988], keeps the newer
value of a register; when all earlier instructions have completed, the main register file is up-
dated from the future file. On an exception, the main register file has the precise values for the
interrupted state. In Chapter 3 , we saw extensions of this idea which are used in processors
such as the PowerPC 620 and the MIPS R10000 to allow overlap and reordering while pre-
serving precise exceptions.
A third technique in use is to allow the exceptions to become somewhat imprecise, but to
keep enough information so that the trap-handling routines can create a precise sequence for
the exception. This means knowing what operations were in the pipeline and their PCs. Then,
after handling the exception, the software finishes any instructions that precede the latest in-
struction completed, and the sequence can restart. Consider the following worst-case code se-
quence:
Instruction 1 —A long-running instruction that eventually interrupts execution.
Instruction 2 , …, Instruction n -1 —A series of instructions that are not completed.
Instruction n —An instruction that is inished.
Given the PCs of all the instructions in the pipeline and the exception return PC, the software
can find the state of instruction 1 and instruction n . Because instruction n has completed, we will
want to restart execution at instruction n +1 . After handling the exception, the software must sim-
ulate the execution of instruction1, …, instruction n −1 . Then we can return from the exception
and restart at instruction n +1 . The complexity of executing these instructions properly by the
handler is the major difficulty of this scheme.
There is an important simplification for simple MIPS-like pipelines: If instruction 2 , …, in-
struction n are all integer instructions, we know that if instruction n has completed then all of in-
struction 2 , …, instruction n −1 have also completed. Thus, only FP operations need to be handled.
To make this scheme tractable, the number of floating-point instructions that can be over-
lapped in execution can be limited. For example, if we only overlap two instructions, then only
the interrupting instruction need be completed by software. This restriction may reduce the
potential throughput if the FP pipelines are deep or if there are a significant number of FP
functional units. This approach is used in the SPARC architecture to allow overlap of loating-
point and integer operations.
The final technique is a hybrid scheme that allows the instruction issue to continue only if it
is certain that all the instructions before the issuing instruction will complete without causing
an exception. This guarantees that when an exception occurs, no instructions after the inter-
rupting one will be completed and all of the instructions before the interrupting one can be
completed. This sometimes means stalling the CPU to maintain precise exceptions. To make
this scheme work, the floating-point functional units must determine if an exception is pos-
sible early in the EX stage (in the first 3 clock cycles in the MIPS pipeline), so as to prevent
further instructions from completing. This scheme is used in the MIPS R2000/3000, the R4000,
and the Intel Pentium. It is discussed further in Appendix J.
Performance Of A MIPS FP Pipeline
The MIPS FP pipeline of Figure C.35 on page C-54 can generate both structural stalls for the
divide unit and stalls for RAW hazards (it also can have WAW hazards, but this rarely occurs
in practice). Figure C.39 shows the number of stall cycles for each type of floating-point op-
Search WWH ::




Custom Search