Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Branch PC (word address) Outcome

454

T

543

NT

777

NT

543

NT

777

NT

454

T

777

NT

454

T

543

T

3.18 [10] <3.9> Suppose we have a deeply pipelined processor, for which we implement a

branch-target buffer for the conditional branches only. Assume that the misprediction pen-

alty is always four cycles and the buffer miss penalty is always three cycles. Assume a 90%

hit rate, 90% accuracy, and 15% branch frequency. How much faster is the processor with

the branch-target buffer versus a processor that has a fixed two-cycle branch penalty? As-

sume a base clock cycle per instruction (CPI) without branch stalls of one.

3.19 [10/5] <3.9> Consider a branch-target buffer that has penalties of zero, two, and two clock

cycles for correct conditional branch prediction, incorrect prediction, and a buffer miss, re-

spectively. Consider a branch-target buffer design that distinguishes conditional and un-

conditional branches, storing the target address for a conditional branch and the target in-

struction for an unconditional branch.

a. [10] <3.9> What is the penalty in clock cycles when an unconditional branch is found

in the buffer

b. [10] <3.9> Determine the improvement from branch folding for unconditional

branches. Assume a 90% hit rate, an unconditional branch frequency of 5%, and a two-

cycle penalty for a buffer miss. How much improvement is gained by this enhance-

ment? How high must the hit rate be for this enhancement to provide a performance

gain?

Search WWH ::

Custom Search

Home