Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

FIGURE C.29 This revised pipeline structure is based on the original in Figure C.23 . It

uses a separate adder, as in Figure C.28 , to compute the branch-target address during ID.

The operations that are new or have changed are in bold. Because the branch-target address

addition happens during ID, it will happen for all instructions; the branch condition ( Regs[IF/

ID.IR 6..10 ] op 0 ) will also be done for all instructions. The selection of the sequential PC or the

branch-target PC still occurs during IF, but it now uses values from the ID stage that corres-

pond to the values set by the previous instruction. This change reduces the branch penalty by

2 cycles: one from evaluating the branch target and condition earlier and one from controlling

the PC selection on the same clock rather than on the next clock. Since the value of cond is

set to 0, unless the instruction in ID is a taken branch, the processor must decode the instruc-

tion before the end of ID. Because the branch is done by the end of ID, the EX, MEM, and WB

stages are unused for branches. An additional complication arises for jumps that have a

longer offset than branches. We can resolve this by using an additional adder that sums the

PC and lower 26 bits of the IR after shifting left by 2 bits.

In some processors, branch hazards are even more expensive in clock cycles than in our ex-

ample, since the time to evaluate the branch condition and compute the destination can be

even longer. For example, a processor with separate decode and register fetch stages will prob-

ably have a branch delay —the length of the control hazard—that is at least 1 clock cycle longer.

The branch delay, unless it is dealt with, turns into a branch penalty. Many older CPUs that

implement more complex instruction sets have branch delays of 4 clock cycles or more, and

large, deeply pipelined processors often have branch penalties of 6 or 7. In general, the deeper

the pipeline, the worse the branch penalty in clock cycles. Of course, the relative performance

effect of a longer branch penalty depends on the overall CPI of the processor. A low-CPI pro-

cessor can afford to have more expensive branches because the percentage of the processor's

performance that will be lost from branches is less.

C.4 What Makes Pipelining Hard to Implement?

Now that we understand how to detect and resolve hazards, we can deal with some complic-

ations that we have avoided so far. The first part of this section considers the challenges of

exceptional situations where the instruction execution order is changed in unexpected ways.

In the second part of this section, we discuss some of the challenges raised by different instruc-

tion sets.

Search WWH ::

Custom Search

Home