Hardware Reference
In-Depth Information
Example
Determine the total branch penalty for a branch-target buffer assuming the pen-
alty cycles for individual mispredictions from Figure 3.23 . Make the following
assumptions about the prediction accuracy and hit rate:
■ Prediction accuracy is 90% (for instructions in the buffer).
■ Hit rate in the buffer is 90% (for branches predicted taken).
Answer
We compute the penalty by looking at the probability of two events: the branch
is predicted taken but ends up being not taken, and the branch is taken but is not
found in the buffer. Both carry a penalty of two cycles.
This penalty compares with a branch penalty for delayed branches, which
we evaluate in Appendix C , of about 0.5 clock cycles per branch. Remember,
though, that the improvement from dynamic branch prediction will grow as the
pipeline length and, hence, the branch delay grows; in addition, beter predictors
will yield a larger performance advantage. Modern high-performance processors
have branch misprediction delays on the order of 15 clock cycles; clearly, accur-
ate prediction is critical!
One variation on the branch-target buffer is to store one or more target instructions instead of,
or in addition to, the predicted target address . This variation has two potential advantages. First,
it allows the branch-target buffer access to take longer than the time between successive instruc-
tion fetches, possibly allowing a larger branch-target buffer. Second, buffering the actual target
instructions allows us to perform an optimization called branch folding . Branch folding can be
used to obtain 0-cycle unconditional branches and sometimes 0-cycle conditional branches.
Consider a branch-target buffer that buffers instructions from the predicted path and is being
accessed with the address of an unconditional branch. The only function of the unconditional
branch is to change the PC. Thus, when the branch-target buffer signals a hit and indicates that
the branch is unconditional, the pipeline can simply substitute the instruction from the branch-
target buffer in place of the instruction that is returned from the cache (which is the uncondi-
tional branch). If the processor is issuing multiple instructions per cycle, then the buffer will
need to supply multiple instructions to obtain the maximum benefit. In some cases, it may be
possible to eliminate the cost of a conditional branch.
Search WWH ::




Custom Search