Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Unconditional branch

4%

Conditional branch, untaken

6%

Conditional branch, taken

10%

Answer

We find the CPIs by multiplying the relative frequency of unconditional, condi-

tional untaken, and conditional taken branches by the respective penalties. The

results are shown in Figure C.16 .

FIGURE C.16 CPI penalties for three branch-prediction schemes and a

deeper pipeline .

The differences among the schemes are substantially increased with this

longer delay. If the base CPI were 1 and branches were the only source of

stalls, the ideal pipeline would be 1.56 times faster than a pipeline that used the

stall-pipeline scheme. The predicted-untaken scheme would be 1.13 times bet-

ter than the stall-pipeline scheme under the same assumptions.

Reducing The Cost Of Branches Through Prediction

As pipelines get deeper and the potential penalty of branches increases, using delayed

branches and similar schemes becomes insufficient. Instead, we need to turn to more aggress-

ive means for predicting branches. Such schemes fall into two classes: low-cost static schemes

that rely on information available at compile time and strategies that predict branches dynam-

ically based on program behavior. We discuss both approaches here.

Static Branch Prediction

A key way to improve compile-time branch prediction is to use profile information collected

from earlier runs. The key observation that makes this worthwhile is that the behavior of

branches is often bimodally distributed; that is, an individual branch is often highly biased to-

ward taken or untaken. Figure C.17 shows the success of branch prediction using this strategy.

The same input data were used for runs and for collecting the profile; other studies have

Search WWH ::

Custom Search

Home