Compiling for VLIW DSPs - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

Another technique is critical path scheduling . First, a critical path in the basic

block is detected; the nodes of that path are removed from the data flow graph and

scheduled in topological order, each in its own issue packet. For the residual data

flow graph, a critical path is determined etc., and this process is repeated until all

nodes in the data flow graph have been scheduled. If there is no appropriate free slot

in an issue packet to accommodate a node to be scheduled, a new issue packet is

inserted.

Time-optimal instruction scheduling for basic blocks is NP-complete for almost

any nontrivial target architecture, including most VLIW architectures. For special

combinations of simple target architectures and restricted data flow graph topologies

such as trees, polynomial-time optimal scheduling algorithms are known.

In the last decade, more expensive optimal methods for local instruction schedul-

ing have become more and more popular, driven by (1) the need to generate high-

quality code for embedded applications, (2) the fact that modern computers offer

the compiler many more CPU cycles that can be spent on advanced optimizations,

and (3) advances in general optimization problem solver technology, especially for

integer linear programming. For local instruction scheduling on general acyclic

data flow graphs, optimal algorithms based on integer linear programming [ 11 ,

34 , 57 , 68 , 93 ] , branch-and-bound [ 22 , 51 ] , constraint logic programming [ 10 ] and

dynamic programming [ 59 , 61 ] have been developed. Also, more expensive heuristic

optimization techniques such as genetic programming [ 34 , 72 , 99 ] have been used

successfully.

Even if the scope limitation to a single basic block is too restrictive in practice,

local instruction scheduling techniques are nevertheless significant because they are

also used in several global scheduling algorithms for larger acyclic code regions and

even in certain cyclic scheduling algorithms, which we will discuss in Sect. 7.2 .

7.2

Modulo Scheduling for Loops

Most DSP programs spend most of their execution time in some (inner) loops.

Efficient loop scheduling techniques are therefore key to high code quality.

Loop unrolling is a simple transformation that can increase the scope of a local

scheduler (and also other code optimizations) beyond the iteration boundaries,

such that independent instructions from different iterations could be scheduled in

parallel. However, loop unrolling increases code size considerably, which is often

undesirable in embedded applications.

Software pipelining is a technique to overlap the execution of subsequent

loop iterations such that independent instructions from different iterations can be

scheduled in parallel on an instruction-level parallel architecture, without having

to replicate the loop body code as in unrolling. As most scheduling problems

with resource and dependence constraints, (rate-)optimal software pipelining is

NP-complete.

Search WWH ::

Custom Search

Home