Hardware Reference
In-Depth Information
Answer
There are two different dependences:
1.
S1
uses a value computed by
S1
in an earlier iteration, since iteration
i
com-
putes
A[i+1]
, which is read in iteration
i+1
. The same is true of
S2
for
B[i]
and
B[i+1]
.
2.
S2
uses the value
A[i+1]
computed by
S1
in the same iteration.
These two dependences are different and have different effects. To see how
they differ, let's assume that only one of these dependences exists at a time. Be-
cause the dependence of statement
S1
is on an earlier iteration of
S1
, this depend-
ence is loop carried. This dependence forces successive iterations of this loop to
execute in series.
The second dependence (
S2
depending on
S1
) is within an iteration and is not
loop carried. Thus, if this were the only dependence, multiple iterations of the
loop could execute in parallel, as long as each pair of statements in an iteration
were kept in order. We saw this type of dependence in an example in
Section
2.2
, where unrolling was able to expose the parallelism. These intra-loop de-
pendences are common; for example, a sequence of vector instructions that uses
chaining exhibits exactly this sort of dependence.
It is also possible to have a loop-carried dependence that does not prevent
parallelism, as the next example shows.
Example
Consider a loop like this one:
for (i=0; i<100; i=i+1) {
A[i] = A[i] + B[i]; /* S1 */
B[i+1] = C[i] + D[i]; /* S2 */
}
What are the dependences between
S1
and
S2
? Is this loop parallel? If not,
show how to make it parallel.
Answer
Statement S1 uses the value assigned in the previous iteration by statement
S2
,
so there is a loop-carried dependence between
S2
and
S1
. Despite this loop-car-
ried dependence, this loop can be made parallel. Unlike the earlier loop, this
dependence is not circular; neither statement depends on itself, and although
S1
depends on
S2
,
S2
does not depend on
S1
. A loop is parallel if it can be written
without a cycle in the dependences, since the absence of a cycle means that the
dependences give a partial ordering on the statements.
Although there are no circular dependences in the above loop, it must be
transformed to conform to the partial ordering and expose the parallelism. Two
observations are critical to this transformation:
Search WWH ::
Custom Search