Digital Signal Processing Reference
In-Depth Information
the kernel) and thus a single register will not be sufficient; special care has to
be taken to access the “right” one at any time. There are two kinds of techniques
for such self-overlapping live ranges: hardware based techniques, such as rotating
register sets and register queues, and software techniques such as modulo variable
expansion [ 63 ] and live range splitting [ 89 ] . With modulo variable expansion ,the
kernel is unrolled and symbolic registers renamed until no live range self-overlaps
any more: if
denotes the maximum length of a self-overlapping live range, the
required unroll factor is
μ
ρ = μ /
II
, and the new initiation interval of the expanded
kernel is II = ρ
II . The drawback of modulo variable expansion is increased code
size and increased register need. An alternative approach is to avoid self-overlapping
live ranges apriori by splitting long live ranges on dependence cycles into shorter
ones, by inserting copy instructions.
Optimal methods for software pipelining based on integer linear programming
have been proposed e.g. by Badia et al. [ 8 ] , Govindarajan et al. [ 46 ] and Yang et al.
[ 97 ] . Combinations of modulo scheduling with other code generation tasks will be
discussed in Sect. 8 .
Software pipelining is often combined with loop unrolling. Especially if the
lower bound MinII is a non-integer value, loop unrolling before software pipelining
can improve throughput. Moreover, loop unrolling reduces loop overhead (at least
on processors that do not have hardware support for zero-overhead loops). The
downside is larger code size.
7.3
Global Instruction Scheduling
Basic blocks are the units of (procedure-)global control flow analysis. The basic
block graph of a program is a directed graph where the nodes correspond to the
basic blocks and edges show control flow transitions between basic blocks, such as
branches or fall-through transitions to branch targets.
Global instruction scheduling methods consider several basic blocks at a time and
allow to move instructions between basic blocks. The (current) scope of a global
scheduling method is referred to as a region . Regions used for global scheduling
include traces, superblocks, hyperblocks and treegions. Local scheduling methods
are extended to address entire regions. Because the scope is larger, global scheduling
has more flexibility and may generate better code than local scheduling. Program
transformations such as function inlining, loop unrolling or predication can be
applied to additionally increase the size of basic blocks and regions.
Theideaof trace scheduling [ 36 ] is to make the most frequently executed
control flow paths fast while accepting possible performance degradations along less
frequently used paths. Execution frequencies are assigned to the outgoing edges at
branch instructions based on static predictions or on profile data. A trace is a linear
path (i.e., free of backwards edges and thereby of loops) through the basic block
graph, where, at each basic block B i in the trace except for the last one, its successor
Search WWH ::




Custom Search