Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Correlating Branch Predictors

The 2-bit predictor schemes use only the recent behavior of a single branch to predict the fu-

ture behavior of that branch. It may be possible to improve the prediction accuracy if we also

look at the recent behavior of other branches rather than just the branch we are trying to pre-

dict. Consider a small code fragment from the eqntot benchmark, a member of early SPEC

benchmark suites that displayed particularly bad branch prediction behavior:

if (aa==2)

aa=0;

if (bb==2)

bb=0;

if (aa!=bb) {

Here is the MIPS code that we would typically generate for this code fragment assuming

that aa and bb are assigned to registers R1 and R2 :

DADDIU R3,R1,#−2

BNEZ R3,L1 ;branch b1 (aa!=2)

DADD R1,R0,R0 ;aa=0

L1: DADDIU R3,R2,#−2

BNEZ R3,L2 ;branch b2 (bb!=2)

DADD R2,R0,R0 ;bb=0

L2: DSUBU R3,R1,R2 ;R3=aa-bb

BEQZ R3,L3 ;branch b3 (aa==bb)

Let's label these branches b1, b2, and b3. The key observation is that the behavior of branch

b3 is correlated with the behavior of branches b1 and b2. Clearly, if branches b1 and b2 are

both not taken (i.e., if the conditions both evaluate to true and aa and bb are both assigned 0),

then b3 will be taken, since aa and bb are clearly equal. A predictor that uses only the behavior

of a single branch to predict the outcome of that branch can never capture this behavior.

Branch predictors that use the behavior of other branches to make a prediction are called

correlating predictors or two-level predictors . Existing correlating predictors add information

about the behavior of the most recent branches to decide how to predict a given branch. For

example, a (1,2) predictor uses the behavior of the last branch to choose from among a pair of

2-bit branch predictors in predicting a particular branch. In the general case, an ( m,n ) predictor

uses the behavior of the last m branches to choose from 2 m branch predictors, each of which is

an n -bit predictor for a single branch. The atraction of this type of correlating branch predict-

or is that it can yield higher prediction rates than the 2-bit scheme and requires only a trivial

amount of additional hardware.

The simplicity of the hardware comes from a simple observation: The global history of the

most recent m branches can be recorded in an m -bit shift register, where each bit records

whether the branch was taken or not taken. The branch-prediction buffer can then be indexed

using a concatenation of the low-order bits from the branch address with the m -bit global his-

tory. For example, in a (2,2) buffer with 64 total entries, the 4 low-order address bits of the

branch (word address) and the 2 global bits representing the behavior of the two most recently

executed branches form a 6-bit index that can be used to index the 64 counters.

How much beter do the correlating branch predictors work when compared with the stand-

ard 2-bit scheme? To compare them fairly, we must compare predictors that use the same

number of state bits. The number of bits in an ( m , n ) predictor is

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home