Hardware Reference
In-Depth Information
The ideal pipeline CPI is a measure of the maximum performance atainable by the imple-
mentation. By reducing each of the terms of the right-hand side, we decrease the overall
pipeline CPI or, alternatively, increase the IPC (instructions per clock). The equation above al-
lows us to characterize various techniques by what component of the overall CPI a technique
reduces. Figure 3.1 shows the techniques we examine in this chapter and in Appendix H, as
well as the topics covered in the introductory material in Appendix C . In this chapter, we will
see that the techniques we introduce to decrease the ideal pipeline CPI can increase the im-
portance of dealing with hazards.
FIGURE 3.1 The major techniques examined in Appendix C , Chapter 3, and Appendix H
are shown together with the component of the CPI equation that the technique affects .
What Is Instruction-Level Parallelism?
All the techniques in this chapter exploit parallelism among instructions. The amount of pair
allelism available within a basic block —a straight-line code sequence with no branches in ex-
cept to the entry and no branches out except at the exit—is quite small. For typical MIPS pro-
grams, the average dynamic branch frequency is often between 15% and 25%, meaning that
between three and six instructions execute between a pair of branches. Since these instructions
are likely to depend upon one another, the amount of overlap we can exploit within a basic
block is likely to be less than the average basic block size. To obtain substantial performance
enhancements, we must exploit ILP across multiple basic blocks.
The simplest and most common way to increase the ILP is to exploit parallelism among it-
erations of a loop. This type of parallelism is often called loop-level parallelism . Here is a simple
example of a loop that adds two 1000-element arrays and is completely parallel:
for (i=0; i<=999; i=i+1)
x[i] = x[i] + y[i];
 
Search WWH ::




Custom Search