Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Because a branch-target buffer predicts the next instruction address and will send it out be-

fore decoding the instruction, we must know whether the fetched instruction is predicted as a

taken branch. If the PC of the fetched instruction matches an address in the prediction buffer,

then the corresponding predicted PC is used as the next PC. The hardware for this branch-tar-

get buffer is essentially identical to the hardware for a cache.

If a matching entry is found in the branch-target buffer, fetching begins immediately at

the predicted PC. Note that unlike a branch-prediction buffer, the predictive entry must be

matched to this instruction because the predicted PC will be sent out before it is known wheth-

er this instruction is even a branch. If the processor did not check whether the entry matched

this PC, then the wrong PC would be sent out for instructions that were not branches, result-

ing in worse performance. We only need to store the predicted-taken branches in the branch-

target buffer, since an untaken branch should simply fetch the next sequential instruction, as

if it were not a branch.

Figure 3.22 shows the steps when using a branch-target buffer for a simple ive-stage

pipeline. From this figure we can see that there will be no branch delay if a branch-prediction

entry is found in the buffer and the prediction is correct. Otherwise, there will be a penalty of

at least two clock cycles. Dealing with the mispredictions and misses is a significant challenge,

since we typically will have to halt instruction fetch while we rewrite the buffer entry. Thus,

we would like to make this process fast to minimize the penalty.

Search WWH ::

Custom Search

Home