Digital Signal Processing Reference
In-Depth Information
Another extension of instruction selection concerns the identification of several
independent IR operations with same operator and short operand data types that
could be merged to utilize a SIMD instruction instead.
Also, the selection of short instruction formats to reduce code size can be
considered a subproblem of instruction selection. While beneficial for instruction
fetch bandwidth, short instruction formats constrain the number of operands, the
size of immediate fields or the set of accessible registers, which can have negative
effects on register allocation and instruction scheduling. For the 16-bit compact
instructions of 'C64x, Hahn et al. [ 50 ] explore the trade-off between performance
and code size.
5
Cluster Assignment for Clustered VLIW Architectures
Cluster assignment for clustered VLIW architectures can be done at IR level or
at target level. It maps each IR operator or instruction, respectively, to one of the
clusters. Also, variables and temporaries to be held in registers must be mapped to
a register file. Indeed, a value could reside in several register files if appropriate
data transfer instructions are added; this is also an issue of register allocation and
instruction scheduling and typically solved later than instruction cluster assignment
in most compilers, although there exist obvious interdependences.
There exist various techniques for cluster assignment for basic blocks. The goal
is to minimize the number of transfer instructions. Usually, heuristic solutions are
applied.
Ellis [ 31 ] gives a heuristic algorithm for cluster assignment called bottom-up
greedy (BUG) for basic blocks and traces (see later) that is applied before instruction
scheduling. Desoli [ 26 ] identifies sub-DAGs of the target-level dataflow graph that
are mapped to a cluster as a whole. Gangwar et al. [ 41 ] first decompose the target-
level dataflow graph into disjoint chains of nodes connected by dataflow edges. The
nodes of a chain will always be mapped to the same cluster. Chains are grouped
together by a greedy heuristic until there are as many chain groups left as clusters.
Finally, chain groups are heuristically assigned to clusters so that the residual cross-
chaingroup dataflow edges coincide with direct inter-cluster communication links
wherever possible. For many-cluster architectures where no fully connected inter-
cluster communication network is available, the algorithm tries to minimize the
communication distance accordingly, such that communicating chain groups are
preferably mapped to clusters that are close to each other.
Hierarchical partitioning heuristics are used e.g. by Aleta et al. [ 3 ] andChu
et al. [ 23 ] . Aleta et al. also consider replication of individual instructions in order to
reduce the amount of communication.
Beg and van Beek [ 12 ] use constraint programming to solve the cluster assign-
ment problem optimally for an idealized multi-cluster architecture with unlimited
inter-cluster communication bandwidth.
 
Search WWH ::




Custom Search