Hardware Reference
In-Depth Information
Return Address Predictors
As we try to increase the opportunity and accuracy of speculation we face the challenge of pre-
dicting indirect jumps, that is, jumps whose destination address varies at runtime. Although
high-level language programs will generate such jumps for indirect procedure calls, select for
case statements, and FORTRAN-computed gotos, the vast majority of the indirect jumps come
from procedure returns. For example, for the SPEC95 benchmarks, procedure returns account
for more than 15% of the branches and the vast majority of the indirect jumps on average. For
object-oriented languages such as C++ and Java, procedure returns are even more frequent.
Thus, focusing on procedure returns seems appropriate.
Though procedure returns can be predicted with a branch-target buffer, the accuracy of
such a prediction technique can be low if the procedure is called from multiple sites and
the calls from one site are not clustered in time. For example, in SPEC CPU95, an aggressive
branch predictor achieves an accuracy of less than 60% for such return branches. To overcome
this problem, some designs use a small buffer of return addresses operating as a stack. This
structure caches the most recent return addresses: pushing a return address on the stack at a
call and popping one of at a return. If the cache is sufficiently large (i.e., as large as the max-
imum call depth), it will predict the returns perfectly. Figure 3.24 shows the performance of
such a return buffer with 0 to 16 elements for a number of the SPEC CPU95 benchmarks. We
will use a similar return predictor when we examine the studies of ILP in Section 3.10 . Both
the Intel Core processors and the AMD Phenom processors have return address predictors.
Search WWH ::




Custom Search