Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Example

Determine the total branch penalty for a branch-target buffer assuming the pen-

alty cycles for individual mispredictions from Figure 3.23 . Make the following

assumptions about the prediction accuracy and hit rate:

■ Prediction accuracy is 90% (for instructions in the buffer).

■ Hit rate in the buffer is 90% (for branches predicted taken).

Answer

We compute the penalty by looking at the probability of two events: the branch

is predicted taken but ends up being not taken, and the branch is taken but is not

found in the buffer. Both carry a penalty of two cycles.

This penalty compares with a branch penalty for delayed branches, which

we evaluate in Appendix C , of about 0.5 clock cycles per branch. Remember,

though, that the improvement from dynamic branch prediction will grow as the

pipeline length and, hence, the branch delay grows; in addition, beter predictors

will yield a larger performance advantage. Modern high-performance processors

have branch misprediction delays on the order of 15 clock cycles; clearly, accur-

ate prediction is critical!

One variation on the branch-target buffer is to store one or more target instructions instead of,

or in addition to, the predicted target address . This variation has two potential advantages. First,

it allows the branch-target buffer access to take longer than the time between successive instruc-

tion fetches, possibly allowing a larger branch-target buffer. Second, buffering the actual target

instructions allows us to perform an optimization called branch folding . Branch folding can be

used to obtain 0-cycle unconditional branches and sometimes 0-cycle conditional branches.

Consider a branch-target buffer that buffers instructions from the predicted path and is being

accessed with the address of an unconditional branch. The only function of the unconditional

branch is to change the PC. Thus, when the branch-target buffer signals a hit and indicates that

the branch is unconditional, the pipeline can simply substitute the instruction from the branch-

target buffer in place of the instruction that is returned from the cache (which is the uncondi-

tional branch). If the processor is issuing multiple instructions per cycle, then the buffer will

need to supply multiple instructions to obtain the maximum benefit. In some cases, it may be

possible to eliminate the cost of a conditional branch.

Search WWH ::

Custom Search

Home