Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Return Address Predictors

As we try to increase the opportunity and accuracy of speculation we face the challenge of pre-

dicting indirect jumps, that is, jumps whose destination address varies at runtime. Although

high-level language programs will generate such jumps for indirect procedure calls, select for

case statements, and FORTRAN-computed gotos, the vast majority of the indirect jumps come

from procedure returns. For example, for the SPEC95 benchmarks, procedure returns account

for more than 15% of the branches and the vast majority of the indirect jumps on average. For

object-oriented languages such as C++ and Java, procedure returns are even more frequent.

Thus, focusing on procedure returns seems appropriate.

Though procedure returns can be predicted with a branch-target buffer, the accuracy of

such a prediction technique can be low if the procedure is called from multiple sites and

the calls from one site are not clustered in time. For example, in SPEC CPU95, an aggressive

branch predictor achieves an accuracy of less than 60% for such return branches. To overcome

this problem, some designs use a small buffer of return addresses operating as a stack. This

structure caches the most recent return addresses: pushing a return address on the stack at a

call and popping one of at a return. If the cache is sufficiently large (i.e., as large as the max-

imum call depth), it will predict the returns perfectly. Figure 3.24 shows the performance of

such a return buffer with 0 to 16 elements for a number of the SPEC CPU95 benchmarks. We

will use a similar return predictor when we examine the studies of ILP in Section 3.10 . Both

the Intel Core processors and the AMD Phenom processors have return address predictors.

Search WWH ::

Custom Search

Home