Hardware Reference
In-Depth Information
Answer
There are two different dependences:
1. S1 uses a value computed by S1 in an earlier iteration, since iteration i com-
putes A[i+1] , which is read in iteration i+1 . The same is true of S2 for B[i]
and B[i+1] .
2. S2 uses the value A[i+1] computed by S1 in the same iteration.
These two dependences are different and have different effects. To see how
they differ, let's assume that only one of these dependences exists at a time. Be-
cause the dependence of statement S1 is on an earlier iteration of S1 , this depend-
ence is loop carried. This dependence forces successive iterations of this loop to
execute in series.
The second dependence ( S2 depending on S1 ) is within an iteration and is not
loop carried. Thus, if this were the only dependence, multiple iterations of the
loop could execute in parallel, as long as each pair of statements in an iteration
were kept in order. We saw this type of dependence in an example in Section
2.2 , where unrolling was able to expose the parallelism. These intra-loop de-
pendences are common; for example, a sequence of vector instructions that uses
chaining exhibits exactly this sort of dependence.
It is also possible to have a loop-carried dependence that does not prevent
parallelism, as the next example shows.
Example
Consider a loop like this one:
for (i=0; i<100; i=i+1) {
A[i] = A[i] + B[i]; /* S1 */
B[i+1] = C[i] + D[i]; /* S2 */
}
What are the dependences between S1 and S2 ? Is this loop parallel? If not,
show how to make it parallel.
Answer
Statement S1 uses the value assigned in the previous iteration by statement S2 ,
so there is a loop-carried dependence between S2 and S1 . Despite this loop-car-
ried dependence, this loop can be made parallel. Unlike the earlier loop, this
dependence is not circular; neither statement depends on itself, and although S1
depends on S2 , S2 does not depend on S1 . A loop is parallel if it can be written
without a cycle in the dependences, since the absence of a cycle means that the
dependences give a partial ordering on the statements.
Although there are no circular dependences in the above loop, it must be
transformed to conform to the partial ordering and expose the parallelism. Two
observations are critical to this transformation:
Search WWH ::




Custom Search