Hardware Reference
In-Depth Information
the condition code must be treated as an operand that requires hazard detection for RAW haz-
ards with branches, just as MIPS must do on the registers.
A final thorny area in pipelining is multicycle operations. Imagine trying to pipeline a se-
quence of VAX instructions such as this:
MOVL R1,R2 ;moves between registers
ADDL3 42(R1),56(R1)+,@(R1) ;adds memory locations
SUBL2 R2,R3 ;subtracts registers
MOVC3 @(R1)[R2],74(R2),R3 ;moves a character string
These instructions differ radically in the number of clock cycles they will require, from
as low as one up to hundreds of clock cycles. They also require different numbers of data
memory accesses, from zero to possibly hundreds. The data hazards are very complex and oc-
cur both between and within instructions. The simple solution of making all instructions ex-
ecute for the same number of clock cycles is unacceptable because it introduces an enormous
number of hazards and bypass conditions and makes an immensely long pipeline. Pipelining
the VAX at the instruction level is difficult, but a clever solution was found by the VAX 8800
designers. They pipeline the microinstruction execution; a microinstruction is a simple instruc-
tion used in sequences to implement a more complex instruction set. Because the microinstruc-
tions are simple (they look a lot like MIPS), the pipeline control is much easier. Since 1995, all
Intel IA-32 microprocessors have used this strategy of converting the IA-32 instructions into
microoperations, and then pipelining the microoperations.
In comparison, load-store processors have simple operations with similar amounts of work
and pipeline more easily. If architects realize the relationship between instruction set design
and pipelining, they can design architectures for more efficient pipelining. In the next section,
we will see how the MIPS pipeline deals with long-running instructions, specifically loating-
point operations.
For many years, the interaction between instruction sets and implementations was believed
to be small, and implementation issues were not a major focus in designing instruction sets.
In the 1980s, it became clear that the difficulty and inefficiency of pipelining could both be in-
creased by instruction set complications. In the 1990s, all companies moved to simpler instruc-
tions sets with the goal of reducing the complexity of aggressive implementations.
C.5 Extending the MIPS Pipeline to Handle Multicycle
Operations
We now want to explore how our MIPS pipeline can be extended to handle floating-point op-
erations. This section concentrates on the basic approach and the design alternatives, closing
with some performance measurements of a MIPS floating-point pipeline.
It is impractical to require that all MIPS FP operations complete in 1 clock cycle, or even in
2. Doing so would mean accepting a slow clock or using enormous amounts of logic in the
FP units, or both. Instead, the FP pipeline will allow for a longer latency for operations. This
is easier to grasp if we imagine the FP instructions as having the same pipeline as the integer
instructions, with two important changes. First, the EX cycle may be repeated as many times
as needed to complete the operation—the number of repetitions can vary for different opera-
tions. Second, there may be multiple FP functional units. A stall will occur if the instruction to
be issued will cause either a structural hazard for the functional unit it uses or a data hazard.
For this section, let's assume that there are four separate functional units in our MIPS im-
plementation:
 
Search WWH ::




Custom Search