Hardware Reference
In-Depth Information
Correlating Branch Predictors
The 2-bit predictor schemes use only the recent behavior of a single branch to predict the fu-
ture behavior of that branch. It may be possible to improve the prediction accuracy if we also
look at the recent behavior of other branches rather than just the branch we are trying to pre-
dict. Consider a small code fragment from the eqntot benchmark, a member of early SPEC
benchmark suites that displayed particularly bad branch prediction behavior:
if (aa==2)
aa=0;
if (bb==2)
bb=0;
if (aa!=bb) {
Here is the MIPS code that we would typically generate for this code fragment assuming
that aa and bb are assigned to registers R1 and R2 :
DADDIU R3,R1,#−2
BNEZ R3,L1 ;branch b1 (aa!=2)
DADD R1,R0,R0 ;aa=0
L1: DADDIU R3,R2,#−2
BNEZ R3,L2 ;branch b2 (bb!=2)
DADD R2,R0,R0 ;bb=0
L2: DSUBU R3,R1,R2 ;R3=aa-bb
BEQZ R3,L3 ;branch b3 (aa==bb)
Let's label these branches b1, b2, and b3. The key observation is that the behavior of branch
b3 is correlated with the behavior of branches b1 and b2. Clearly, if branches b1 and b2 are
both not taken (i.e., if the conditions both evaluate to true and aa and bb are both assigned 0),
then b3 will be taken, since aa and bb are clearly equal. A predictor that uses only the behavior
of a single branch to predict the outcome of that branch can never capture this behavior.
Branch predictors that use the behavior of other branches to make a prediction are called
correlating predictors or two-level predictors . Existing correlating predictors add information
about the behavior of the most recent branches to decide how to predict a given branch. For
example, a (1,2) predictor uses the behavior of the last branch to choose from among a pair of
2-bit branch predictors in predicting a particular branch. In the general case, an ( m,n ) predictor
uses the behavior of the last m branches to choose from 2 m branch predictors, each of which is
an n -bit predictor for a single branch. The atraction of this type of correlating branch predict-
or is that it can yield higher prediction rates than the 2-bit scheme and requires only a trivial
amount of additional hardware.
The simplicity of the hardware comes from a simple observation: The global history of the
most recent m branches can be recorded in an m -bit shift register, where each bit records
whether the branch was taken or not taken. The branch-prediction buffer can then be indexed
using a concatenation of the low-order bits from the branch address with the m -bit global his-
tory. For example, in a (2,2) buffer with 64 total entries, the 4 low-order address bits of the
branch (word address) and the 2 global bits representing the behavior of the two most recently
executed branches form a 6-bit index that can be used to index the 64 counters.
How much beter do the correlating branch predictors work when compared with the stand-
ard 2-bit scheme? To compare them fairly, we must compare predictors that use the same
number of state bits. The number of bits in an ( m , n ) predictor is
Search WWH ::




Custom Search