Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Although many figures will omit such registers for simplicity, they are required to make the

pipeline operate properly and must be present. Of course, similar registers would be needed

even in a multicycle data path that had no pipelining (since only values in registers are pre-

served across clock boundaries). In the case of a pipelined processor, the pipeline registers also

play the key role of carrying intermediate results from one stage to another where the source

and destination may not be directly adjacent. For example, the register value to be stored dur-

ing a store instruction is read during ID, but not actually used until MEM; it is passed through

two pipeline registers to reach the data memory during the MEM stage. Likewise, the res-

ult of an ALU instruction is computed during EX, but not actually stored until WB; it arrives

there by passing through two pipeline registers. It is sometimes useful to name the pipeline

registers, and we follow the convention of naming them by the pipeline stages they connect,

so that the registers are called IF/ID, ID/EX, EX/MEM, and MEM/WB.

Basic Performance Issues In Pipelining

Pipelining increases the CPU instruction throughput—the number of instructions completed

per unit of time—but it does not reduce the execution time of an individual instruction. In fact,

it usually slightly increases the execution time of each instruction due to overhead in the con-

trol of the pipeline. The increase in instruction throughput means that a program runs faster

and has lower total execution time, even though no single instruction runs faster!

The fact that the execution time of each instruction does not decrease puts limits on the prac-

tical depth of a pipeline, as we will see in the next section. In addition to limitations arising

from pipeline latency, limits arise from imbalance among the pipe stages and from pipelining

overhead. Imbalance among the pipe stages reduces performance since the clock can run no

faster than the time needed for the slowest pipeline stage. Pipeline overhead arises from the

combination of pipeline register delay and clock skew. The pipeline registers add setup time,

which is the time that a register input must be stable before the clock signal that triggers a

write occurs, plus propagation delay to the clock cycle. Clock skew, which is maximum delay

between when the clock arrives at any two registers, also contributes to the lower limit on the

clock cycle. Once the clock cycle is as small as the sum of the clock skew and latch overhead,

no further pipelining is useful, since there is no time left in the cycle for useful work. The in-

terested reader should see Kunkel and Smith [1986]. As we saw in Chapter 3 , this overhead

afected the performance gains achieved by the Pentium 4 versus the Pentium III.

Example

Consider the unpipelined processor in the previous section. Assume that it has

a 1 ns clock cycle and that it uses 4 cycles for ALU operations and branches and

5 cycles for memory operations. Assume that the relative frequencies of these

operations are 40%, 20%, and 40%, respectively. Suppose that due to clock skew

and setup, pipelining the processor adds 0.2 ns of overhead to the clock. Ignor-

ing any latency impact, how much speedup in the instruction execution rate

will we gain from a pipeline?

Answer

The average instruction execution time on the unpipelined processor is

Search WWH ::

Custom Search

Home