Hardware Reference
In-Depth Information
4.5 Detecting and Enhancing Loop-Level Parallelism
Loops in programs are the fountainhead of many of the types of parallelism we discussed
above and in Chapter 5 . In this section, we discuss compiler technology for discovering the
amount of parallelism that we can exploit in a program as well as hardware support for these
compiler techniques. We define precisely when a loop is parallel (or vectorizable), how de-
pendence can prevent a loop from being parallel, and techniques for eliminating some types
of dependences. Finding and manipulating loop-level parallelism is critical to exploiting both
DLP and TLP, as well as the more aggressive static ILP approaches (e.g., VLIW) that we exam-
ine in Appendix H.
Loop-level parallelism is normally analyzed at the source level or close to it, while most ana-
lysis of ILP is done once instructions have been generated by the compiler. Loop-level analysis
involves determining what dependences exist among the operands in a loop across the iter-
ations of that loop. For now, we will consider only data dependences, which arise when an
operand is writen at some point and read at a later point. Name dependences also exist and
may be removed by the renaming techniques discussed in Chapter 3 .
The analysis of loop-level parallelism focuses on determining whether data accesses in later
iterations are dependent on data values produced in earlier iterations; such dependence is
called a loop-carried dependence . Most of the examples we considered in Chapters 2 and 3 had
no loop-carried dependences and, thus, are loop-level parallel. To see that a loop is parallel,
let us first look at the source representation:
for (i=999; i>=0; i=i−1)
x[i] = x[i] + is
In this loop, the two uses of x[i] are dependent, but this dependence is within a single it-
eration and is not loop carried. There is a loop-carried dependence between successive uses
of i in different iterations, but this dependence involves an induction variable that can be eas-
ily recognized and eliminated. We saw examples of how to eliminate dependences involving
induction variables during loop unrolling in Section 2.2 of Chapter 2, and we will look at ad-
ditional examples later in this section.
Because finding loop-level parallelism involves recognizing structures such as loops, array
references, and induction variable computations, the compiler can do this analysis more easily
at or near the source level, as opposed to the machine-code level. Let's look at a more complex
example.
Example
Consider a loop like this one:
for (i=0; i<100; i=i+1) {
A[i+1] = A[i] + C[i]; /* S1 */
B[i+1] = B[i] + A[i+1]; /* S2 */
}
Assume that A , B , and C are distinct, nonoverlapping arrays. (In practice, the
arrays may sometimes be the same or may overlap. Because the arrays may
be passed as parameters to a procedure that includes this loop, determining
whether arrays overlap or are identical often requires sophisticated, interpro-
cedural analysis of the program.) What are the data dependences among the
statements S1 and S2 in the loop?
 
Search WWH ::




Custom Search