Digital Signal Processing Reference
In-Depth Information
the kernel) and thus a single register will not be sufficient; special care has to
be taken to access the “right” one at any time. There are two kinds of techniques
for such self-overlapping live ranges: hardware based techniques, such as rotating
register sets and register queues, and software techniques such as modulo variable
kernel is unrolled and symbolic registers renamed until no live range self-overlaps
any more: if
denotes the maximum length of a self-overlapping live range, the
required unroll factor is
μ
ρ
=
μ
/
II
, and the new initiation interval of the expanded
kernel is
II
=
ρ
II
. The drawback of modulo variable expansion is increased code
size and increased register need. An alternative approach is to avoid self-overlapping
live ranges
apriori
by splitting long live ranges on dependence cycles into shorter
ones, by inserting copy instructions.
Optimal methods for software pipelining based on integer linear programming
Software pipelining is often combined with loop unrolling. Especially if the
lower bound
MinII
is a non-integer value, loop unrolling before software pipelining
can improve throughput. Moreover, loop unrolling reduces loop overhead (at least
on processors that do not have hardware support for zero-overhead loops). The
downside is larger code size.
7.3
Global Instruction Scheduling
Basic blocks are the units of (procedure-)global control flow analysis. The
basic
block graph
of a program is a directed graph where the nodes correspond to the
basic blocks and edges show control flow transitions between basic blocks, such as
branches or fall-through transitions to branch targets.
Global instruction scheduling methods consider several basic blocks at a time and
allow to move instructions between basic blocks. The (current) scope of a global
scheduling method is referred to as a
region
. Regions used for global scheduling
include traces, superblocks, hyperblocks and treegions. Local scheduling methods
are extended to address entire regions. Because the scope is larger, global scheduling
has more flexibility and may generate better code than local scheduling. Program
transformations such as function inlining, loop unrolling or predication can be
applied to additionally increase the size of basic blocks and regions.
control flow paths fast while accepting possible performance degradations along less
frequently used paths. Execution frequencies are assigned to the outgoing edges at
branch instructions based on static predictions or on profile data. A
trace
is a linear
path (i.e., free of backwards edges and thereby of loops) through the basic block
graph, where, at each basic block
B
i
in the trace except for the last one, its successor