Compiling for VLIW DSPs - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

Computing a minimum cost covering for a directed acyclic graph (DAG)

is assumed to be NP-complete, but by splitting the DAG into trees processed

separately and forcing the shared nodes' results to be stored in registers, dynamic-

programming based bottom-up tree pattern matching techniques can still be used

as heuristic methods. Ertl [ 35 ] gives an algorithm to check if a given processor

instruction set (i.e., tree grammar) belongs to a special class of architectures

(containing e.g. MIPS and SPARC processors) where the constrained tree pattern

matching always produces optimal coverings for DAGs.

Another way to compute a minimum cost covering, albeit an expensive one,

is to apply advanced optimization methods such as integer linear programming

[ 11 , 33 , 67 , 94 ] or partitioned boolean quadratic programming [ 28 ] . This may be

an applicable choice if integer linear programming is also used for solving other

subtasks such as register allocation or instruction scheduling, the basic block and

the number of patterns are not too large, and a close integration between these

tasks is desired to produce high-quality code. Furthermore, solving the problem

by such general optimization techniques is by no means restricted to tree-shaped

patterns or tree-shaped IR graphs. In particular, they work well with complex

patterns such as forests (non-connected trees, as in Fig. 10 c ) and directed acyclic

graph (DAG) patterns, as with autoincrement load and store instructions or memory

read-modify-write instructions and even on cyclic IR structures, such as static single

assignment (SSA) representation. Because covering several nodes with a complex

pattern corresponds to merging these nodes, special care has to be taken with

forest and DAG patterns to avoid the creation of artificial dependence cycles by

the matching, which could lead to non-schedulable code [ 27 ] .

Instruction selection can be combined with resource allocation , i.e., the binding

of instructions to executing resources. For some instructions, there may be no

choice: for instance, on 'C62x, a multiply instruction can only be executed on a

.M unit. In case that the same instruction can execute on different functional units,

each with its own cost, latency and resource reservations, one could model these

variants as different instructions that just share the pattern defining the semantics.

Like instruction selection, resource allocation is often done before instruction

scheduling, but a tighter integration with scheduling would be helpful because

resource allocation clearly constrains the scheduler.

A natural extension of this approach is to also model cluster assignment for

instructions as a resource allocation problem and thus as extended instruction

selection problem. However, cluster allocation has traditionally been treated as a

separate problem; we will discuss it in Sect. 5 .

Further target-level optimizations could be modeled as part of instruction

selection. For instance, for special cases of IR patterns there could exist alternative

code sequences that may be faster, shorter, or more appropriate for later phases

in code generation. As an example, an ordinary integer multiplication by 2 can

be implemented in at least three different ways ( mutations ): by a MUL instruction,

maybe running on a separate multiplier unit, by an ADD instruction, and by a left-

shift ( SHL ) by one, each having different latency and resource requirements. The

ability to consider such mutations during instruction scheduling increases flexibility

and can thus improve code quality [ 80 ] .

Signal Processing Systems

Search WWH ::

Custom Search

Home