Hardware Reference
In-Depth Information
for this pipeline. In some processors—especially those with implicitly set condition codes for
more powerful (and hence slower) branch conditions—the branch target is known before the
branch outcome, and a predicted-taken scheme might make sense. In either a predicted-taken
for predicted-not-taken scheme, the compiler can improve performance by organizing the code
so that the most frequent path matches the hardware's choice. Our fourth scheme provides
more opportunities for the compiler to improve performance.
A fourth scheme in use in some processors is called delayed branch . This technique was heav-
ily used in early RISC processors and works reasonably well in the five-stage pipeline. In a
delayed branch, the execution cycle with a branch delay of one is
branch instruction
sequential successor 1
branch target if taken
The sequential successor is in the branch delay slot . This instruction is executed whether or
not the branch is taken. The pipeline behavior of the five-stage pipeline with a branch delay
is shown in Figure C.13 . Although it is possible to have a branch delay longer than one, in
practice almost all processors with delayed branch have a single instruction delay; other tech-
niques are used if the pipeline has a longer potential branch penalty.
FIGURE C.13 The behavior of a delayed branch is the same whether or not the branch
is taken . The instructions in the delay slot (there is only one delay slot for MIPS) are ex-
ecuted. If the branch is untaken, execution continues with the instruction after the branch
delay instruction; if the branch is taken, execution continues at the branch target. When the in-
struction in the branch delay slot is also a branch, the meaning is unclear: If the branch is not
taken, what should happen to the branch in the branch delay slot? Because of this confusion,
architectures with delay branches often disallow putting a branch in the delay slot.
The job of the compiler is to make the successor instructions valid and useful. A number of
optimizations are used. Figure C.14 shows the three ways in which the branch delay can be
scheduled.
 
Search WWH ::




Custom Search