Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Reducing Pipeline Branch Penalties

There are many methods for dealing with the pipeline stalls caused by branch delay; we dis-

cuss four simple compile time schemes in this subsection. In these four schemes the actions

for a branch are static—they are fixed for each branch during the entire execution. The soft-

ware can try to minimize the branch penalty using knowledge of the hardware scheme and

of branch behavior. Chapter 3 looks at more powerful hardware and software techniques for

both static and dynamic branch prediction.

The simplest scheme to handle branches is to freeze or flush the pipeline, holding or deleting

any instructions after the branch until the branch destination is known. The atractiveness of

this solution lies primarily in its simplicity both for hardware and software. It is the solution

used earlier in the pipeline shown in Figure C.11 . In this case, the branch penalty is fixed and

cannot be reduced by software.

A higher-performance, and only slightly more complex, scheme is to treat every branch as

not taken, simply allowing the hardware to continue as if the branch were not executed. Here,

care must be taken not to change the processor state until the branch outcome is deinitely

known. The complexity of this scheme arises from having to know when the state might be

changed by an instruction and how to “back out” such a change.

In the simple five-stage pipeline, this predicted-not-taken or predicted-untaken scheme is im-

plemented by continuing to fetch instructions as if the branch were a normal instruction. The

pipeline looks as if nothing out of the ordinary is happening. If the branch is taken, however,

we need to turn the fetched instruction into a no-op and restart the fetch at the target address.

Figure C.12 shows both situations.

FIGURE C.12 The predicted-not-taken scheme and the pipeline sequence when the

branch is untaken (top) and taken (bottom) . When the branch is untaken, determined dur-

ing ID, we fetch the fall-through and just continue. If the branch is taken during ID, we restart

the fetch at the branch target. This causes all instructions following the branch to stall 1 clock

cycle.

An alternative scheme is to treat every branch as taken. As soon as the branch is decoded

and the target address is computed, we assume the branch to be taken and begin fetching

and executing at the target. Because in our five-stage pipeline we don't know the target ad-

dress any earlier than we know the branch outcome, there is no advantage in this approach

Search WWH ::

Custom Search

Home