Compiling for VLIW DSPs - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

A(1);

FOR i FROM 1 TO N-1 DO

B(i);

C(i) || A(i+1);

END DO

B(N);

C(N);

FORiFROM1TONDO

A(i);

B(i);

C(i);

END DO

Fig. 11

Simple example: ( a ) Original loop, where A

(

)

, B

(

)

, C

(

)

denote operations in the loop

body that may compete for common resources, in this example B

(

)

and C

(

)

, and that may involve

both loop-independent data dependences, here A

(

) →

(

)

and A

(

) →

(

)

, and loop-carried data

dependences, here B

(

) →

(

)

, see the dependence graph in ( b ). ( c ) After software pipelining

Software pipelining has been researched intensively, both as a high-level loop

transformation (performed in the middle end of a compiler or even as source-

to-source program transformation) and as low-level optimization late in the code

generation process (performed in the back end of a compiler), after instruction

selection with resource allocation has been performed. The former approaches are

independent of particular instructions and functional units to be selected for all

operations in the loop, and thus have to rely on inaccurate cost estimations for

execution time, energy, or register pressure when comparing various alternatives,

while the actual cost will also depend on decisions made late in the code generation

process. The latter approaches are bound to fixed instructions and functional units,

and hence the flexibility of implementing the same abstract operation by a variety

of different target machine instructions, with different resource requirements and

latency behavior, is lost. In either case, optimization opportunities are missed

because interdependent problems are solved separately in different compiler phases.

Approaches to integrate software pipelining with other code generation tasks will be

discussed in Sect. 8 .

Software pipelining , also called cyclic scheduling , transforms a loop such as that

in Fig. 11 a into an equivalent loop whose body contains operations from different

iterations of the original loop, such as that in Fig. 11 c , which may result in faster

code on an instruction-level parallel architecture as e.g. C

could

be executed in parallel ( || ) because they are statically known to be independent

of each other and not to subscribe to the same hardware resources. This parallel

execution was not possible in the original version of the loop because the code

generator usually treats the loop body (a basic block) as a unit for scheduling and

resource allocation, and furthermore separates the code for C

(

)

and A

(

)

a backward branch to the loop entry. The body of the transformed loop is called

the kernel , the operations before the kernel that “fill” the pipeline (here A

(

)

and A

(

)

)are

called the prologue , and the operations after the kernel that “drain” the pipeline (here

(

)

), are called the epilogue of the software-pipelined loop. Software

pipelining thus overlaps in the new kernel the execution of operations originating

from different iterations of the original loop, as far as permitted by given dependence

and resource constraints, in order to solicit more opportunities for parallel execution

(

)

and C

(

)

Signal Processing Systems

Search WWH ::

Custom Search

Home