Digital Signal Processing Reference
In-Depth Information
Another technique is critical path scheduling . First, a critical path in the basic
block is detected; the nodes of that path are removed from the data flow graph and
scheduled in topological order, each in its own issue packet. For the residual data
flow graph, a critical path is determined etc., and this process is repeated until all
nodes in the data flow graph have been scheduled. If there is no appropriate free slot
in an issue packet to accommodate a node to be scheduled, a new issue packet is
inserted.
Time-optimal instruction scheduling for basic blocks is NP-complete for almost
any nontrivial target architecture, including most VLIW architectures. For special
combinations of simple target architectures and restricted data flow graph topologies
such as trees, polynomial-time optimal scheduling algorithms are known.
In the last decade, more expensive optimal methods for local instruction schedul-
ing have become more and more popular, driven by (1) the need to generate high-
quality code for embedded applications, (2) the fact that modern computers offer
the compiler many more CPU cycles that can be spent on advanced optimizations,
and (3) advances in general optimization problem solver technology, especially for
integer linear programming. For local instruction scheduling on general acyclic
data flow graphs, optimal algorithms based on integer linear programming [ 11 ,
34 , 57 , 68 , 93 ] , branch-and-bound [ 22 , 51 ] , constraint logic programming [ 10 ] and
dynamic programming [ 59 , 61 ] have been developed. Also, more expensive heuristic
optimization techniques such as genetic programming [ 34 , 72 , 99 ] have been used
successfully.
Even if the scope limitation to a single basic block is too restrictive in practice,
local instruction scheduling techniques are nevertheless significant because they are
also used in several global scheduling algorithms for larger acyclic code regions and
even in certain cyclic scheduling algorithms, which we will discuss in Sect. 7.2 .
7.2
Modulo Scheduling for Loops
Most DSP programs spend most of their execution time in some (inner) loops.
Efficient loop scheduling techniques are therefore key to high code quality.
Loop unrolling is a simple transformation that can increase the scope of a local
scheduler (and also other code optimizations) beyond the iteration boundaries,
such that independent instructions from different iterations could be scheduled in
parallel. However, loop unrolling increases code size considerably, which is often
undesirable in embedded applications.
Software pipelining is a technique to overlap the execution of subsequent
loop iterations such that independent instructions from different iterations can be
scheduled in parallel on an instruction-level parallel architecture, without having
to replicate the loop body code as in unrolling. As most scheduling problems
with resource and dependence constraints, (rate-)optimal software pipelining is
NP-complete.
 
Search WWH ::




Custom Search