Hardware Reference
In-Depth Information
control variable is incremented on every loop iteration, the loop contains at least one de-
pendence. As we show in Appendix H, loop unrolling and aggressive algebraic optimiza-
tion can remove such dependent computation. Wall's study includes a limited amount of
such optimizations, but applying them more aggressively could lead to increased amounts
of ILP. In addition, certain code generation conventions introduce unneeded dependences,
in particular the use of return address registers and a register for the stack pointer (which
is incremented and decremented in the call/return sequence). Wall removes the effect of the
return address register, but the use of a stack pointer in the linkage convention can cause
“unnecessary” dependences. Postif et al. [1999] explored the advantages of removing this
constraint.
3. Overcoming the data flow limit —If value prediction worked with high accuracy, it could
overcome the data flow limit. As of yet, none of the more than 100 papers on the subject has
achieved a significant enhancement in ILP when using a realistic prediction scheme. Ob-
viously, perfect data value prediction would lead to effectively infinite parallelism, since
every value of every instruction could be predicted a priori .
For a less-than-perfect processor, several ideas have been proposed that could expose more
ILP. One example is to speculate along multiple paths. This idea was discussed by Lam and
Wilson [1992] and explored in the study covered in this section. By speculating on multiple
paths, the cost of incorrect recovery is reduced and more parallelism can be uncovered. It only
makes sense to evaluate this scheme for a limited number of branches because the hardware
resources required grow exponentially. Wall [1993] provided data for speculating in both dir-
ections on up to eight branches. Given the costs of pursuing both paths, knowing that one will
be thrown away (and the growing amount of useless computation as such a process is fol-
lowed through multiple branches), every commercial design has instead devoted additional
hardware to beter speculation on the correct path.
It is critical to understand that none of the limits in this section is fundamental in the
sense that overcoming them requires a change in the laws of physics! Instead, they are prac-
tical limitations that imply the existence of some formidable barriers to exploiting addition-
al ILP. These limitations—whether they be window size, alias detection, or branch predic-
tion—represent challenges for designers and researchers to overcome.
Atempts to break through these limits in the irst ive years of this century met with frus-
tration. Some techniques produced small improvements, but often at significant increases in
complexity, increases in the clock cycle, and disproportionate increases in power. In summary,
designers discovered that trying to extract more ILP was simply too inefficient. We will return
to this discussion in our concluding remarks.
3.11 Cross-Cutting Issues: ILP Approaches and the
Memory System
Hardware Versus Software Speculation
The hardware-intensive approaches to speculation in this chapter and the software ap-
proaches of Appendix H provide alternative approaches to exploiting ILP. Some of the trade-
ofs, and the limitations, for these approaches are listed below:
■ To speculate extensively, we must be able to disambiguate memory references. This cap-
ability is difficult to do at compile time for integer programs that contain pointers. In a
hardware-based scheme, dynamic runtime disambiguation of memory addresses is done
using the techniques we saw earlier for Tomasulo's algorithm. This disambiguation allows
Search WWH ::




Custom Search