Digital Signal Processing Reference
In-Depth Information
Computing a minimum cost covering for a directed acyclic graph (DAG)
is assumed to be NP-complete, but by splitting the DAG into trees processed
separately and forcing the shared nodes' results to be stored in registers, dynamic-
programming based bottom-up tree pattern matching techniques can still be used
as heuristic methods. Ertl [ 35 ] gives an algorithm to check if a given processor
instruction set (i.e., tree grammar) belongs to a special class of architectures
(containing e.g. MIPS and SPARC processors) where the constrained tree pattern
matching always produces optimal coverings for DAGs.
Another way to compute a minimum cost covering, albeit an expensive one,
is to apply advanced optimization methods such as integer linear programming
[ 11 , 33 , 67 , 94 ] or partitioned boolean quadratic programming [ 28 ] . This may be
an applicable choice if integer linear programming is also used for solving other
subtasks such as register allocation or instruction scheduling, the basic block and
the number of patterns are not too large, and a close integration between these
tasks is desired to produce high-quality code. Furthermore, solving the problem
by such general optimization techniques is by no means restricted to tree-shaped
patterns or tree-shaped IR graphs. In particular, they work well with complex
patterns such as forests (non-connected trees, as in Fig. 10 c ) and directed acyclic
graph (DAG) patterns, as with autoincrement load and store instructions or memory
read-modify-write instructions and even on cyclic IR structures, such as static single
assignment (SSA) representation. Because covering several nodes with a complex
pattern corresponds to merging these nodes, special care has to be taken with
forest and DAG patterns to avoid the creation of artificial dependence cycles by
the matching, which could lead to non-schedulable code [ 27 ] .
Instruction selection can be combined with resource allocation , i.e., the binding
of instructions to executing resources. For some instructions, there may be no
choice: for instance, on 'C62x, a multiply instruction can only be executed on a
.M unit. In case that the same instruction can execute on different functional units,
each with its own cost, latency and resource reservations, one could model these
variants as different instructions that just share the pattern defining the semantics.
Like instruction selection, resource allocation is often done before instruction
scheduling, but a tighter integration with scheduling would be helpful because
resource allocation clearly constrains the scheduler.
A natural extension of this approach is to also model cluster assignment for
instructions as a resource allocation problem and thus as extended instruction
selection problem. However, cluster allocation has traditionally been treated as a
separate problem; we will discuss it in Sect. 5 .
Further target-level optimizations could be modeled as part of instruction
selection. For instance, for special cases of IR patterns there could exist alternative
code sequences that may be faster, shorter, or more appropriate for later phases
in code generation. As an example, an ordinary integer multiplication by 2 can
be implemented in at least three different ways ( mutations ): by a MUL instruction,
maybe running on a separate multiplier unit, by an ADD instruction, and by a left-
shift ( SHL ) by one, each having different latency and resource requirements. The
ability to consider such mutations during instruction scheduling increases flexibility
and can thus improve code quality [ 80 ] .
Search WWH ::




Custom Search