Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

the condition code must be treated as an operand that requires hazard detection for RAW haz-

ards with branches, just as MIPS must do on the registers.

A final thorny area in pipelining is multicycle operations. Imagine trying to pipeline a se-

quence of VAX instructions such as this:

MOVL R1,R2 ;moves between registers

ADDL3 42(R1),56(R1)+,@(R1) ;adds memory locations

SUBL2 R2,R3 ;subtracts registers

MOVC3 @(R1)[R2],74(R2),R3 ;moves a character string

These instructions differ radically in the number of clock cycles they will require, from

as low as one up to hundreds of clock cycles. They also require different numbers of data

memory accesses, from zero to possibly hundreds. The data hazards are very complex and oc-

cur both between and within instructions. The simple solution of making all instructions ex-

ecute for the same number of clock cycles is unacceptable because it introduces an enormous

number of hazards and bypass conditions and makes an immensely long pipeline. Pipelining

the VAX at the instruction level is difficult, but a clever solution was found by the VAX 8800

designers. They pipeline the microinstruction execution; a microinstruction is a simple instruc-

tion used in sequences to implement a more complex instruction set. Because the microinstruc-

tions are simple (they look a lot like MIPS), the pipeline control is much easier. Since 1995, all

Intel IA-32 microprocessors have used this strategy of converting the IA-32 instructions into

microoperations, and then pipelining the microoperations.

In comparison, load-store processors have simple operations with similar amounts of work

and pipeline more easily. If architects realize the relationship between instruction set design

and pipelining, they can design architectures for more efficient pipelining. In the next section,

we will see how the MIPS pipeline deals with long-running instructions, specifically loating-

point operations.

For many years, the interaction between instruction sets and implementations was believed

to be small, and implementation issues were not a major focus in designing instruction sets.

In the 1980s, it became clear that the difficulty and inefficiency of pipelining could both be in-

creased by instruction set complications. In the 1990s, all companies moved to simpler instruc-

tions sets with the goal of reducing the complexity of aggressive implementations.

C.5 Extending the MIPS Pipeline to Handle Multicycle

Operations

We now want to explore how our MIPS pipeline can be extended to handle floating-point op-

erations. This section concentrates on the basic approach and the design alternatives, closing

with some performance measurements of a MIPS floating-point pipeline.

It is impractical to require that all MIPS FP operations complete in 1 clock cycle, or even in

2. Doing so would mean accepting a slow clock or using enormous amounts of logic in the

FP units, or both. Instead, the FP pipeline will allow for a longer latency for operations. This

is easier to grasp if we imagine the FP instructions as having the same pipeline as the integer

instructions, with two important changes. First, the EX cycle may be repeated as many times

as needed to complete the operation—the number of repetitions can vary for different opera-

tions. Second, there may be multiple FP functional units. A stall will occur if the instruction to

be issued will cause either a structural hazard for the functional unit it uses or a data hazard.

For this section, let's assume that there are four separate functional units in our MIPS im-

plementation:

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home