Hardware Reference
In-Depth Information
Reducing Pipeline Branch Penalties
There are many methods for dealing with the pipeline stalls caused by branch delay; we dis-
cuss four simple compile time schemes in this subsection. In these four schemes the actions
for a branch are static—they are fixed for each branch during the entire execution. The soft-
ware can try to minimize the branch penalty using knowledge of the hardware scheme and
of branch behavior. Chapter 3 looks at more powerful hardware and software techniques for
both static and dynamic branch prediction.
The simplest scheme to handle branches is to freeze or flush the pipeline, holding or deleting
any instructions after the branch until the branch destination is known. The atractiveness of
this solution lies primarily in its simplicity both for hardware and software. It is the solution
used earlier in the pipeline shown in Figure C.11 . In this case, the branch penalty is fixed and
cannot be reduced by software.
A higher-performance, and only slightly more complex, scheme is to treat every branch as
not taken, simply allowing the hardware to continue as if the branch were not executed. Here,
care must be taken not to change the processor state until the branch outcome is deinitely
known. The complexity of this scheme arises from having to know when the state might be
changed by an instruction and how to “back out” such a change.
In the simple five-stage pipeline, this predicted-not-taken or predicted-untaken scheme is im-
plemented by continuing to fetch instructions as if the branch were a normal instruction. The
pipeline looks as if nothing out of the ordinary is happening. If the branch is taken, however,
we need to turn the fetched instruction into a no-op and restart the fetch at the target address.
Figure C.12 shows both situations.
FIGURE C.12 The predicted-not-taken scheme and the pipeline sequence when the
branch is untaken (top) and taken (bottom) . When the branch is untaken, determined dur-
ing ID, we fetch the fall-through and just continue. If the branch is taken during ID, we restart
the fetch at the branch target. This causes all instructions following the branch to stall 1 clock
cycle.
An alternative scheme is to treat every branch as taken. As soon as the branch is decoded
and the target address is computed, we assume the branch to be taken and begin fetching
and executing at the target. Because in our five-stage pipeline we don't know the target ad-
dress any earlier than we know the branch outcome, there is no advantage in this approach
 
Search WWH ::




Custom Search