Hardware Reference
In-Depth Information
Unconditional branch
4%
Conditional branch, untaken
6%
Conditional branch, taken
10%
Answer
We find the CPIs by multiplying the relative frequency of unconditional, condi-
tional untaken, and conditional taken branches by the respective penalties. The
results are shown in Figure C.16 .
FIGURE C.16 CPI penalties for three branch-prediction schemes and a
deeper pipeline .
The differences among the schemes are substantially increased with this
longer delay. If the base CPI were 1 and branches were the only source of
stalls, the ideal pipeline would be 1.56 times faster than a pipeline that used the
stall-pipeline scheme. The predicted-untaken scheme would be 1.13 times bet-
ter than the stall-pipeline scheme under the same assumptions.
Reducing The Cost Of Branches Through Prediction
As pipelines get deeper and the potential penalty of branches increases, using delayed
branches and similar schemes becomes insufficient. Instead, we need to turn to more aggress-
ive means for predicting branches. Such schemes fall into two classes: low-cost static schemes
that rely on information available at compile time and strategies that predict branches dynam-
ically based on program behavior. We discuss both approaches here.
Static Branch Prediction
A key way to improve compile-time branch prediction is to use profile information collected
from earlier runs. The key observation that makes this worthwhile is that the behavior of
branches is often bimodally distributed; that is, an individual branch is often highly biased to-
ward taken or untaken. Figure C.17 shows the success of branch prediction using this strategy.
The same input data were used for runs and for collecting the profile; other studies have
 
Search WWH ::




Custom Search