Compiling for VLIW DSPs - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

the kernel) and thus a single register will not be sufficient; special care has to

be taken to access the “right” one at any time. There are two kinds of techniques

for such self-overlapping live ranges: hardware based techniques, such as rotating

register sets and register queues, and software techniques such as modulo variable

expansion [ 63 ] and live range splitting [ 89 ] . With modulo variable expansion ,the

kernel is unrolled and symbolic registers renamed until no live range self-overlaps

any more: if

denotes the maximum length of a self-overlapping live range, the

required unroll factor is

μ

ρ = μ /

II

, and the new initiation interval of the expanded

kernel is II = ρ

II . The drawback of modulo variable expansion is increased code

size and increased register need. An alternative approach is to avoid self-overlapping

live ranges apriori by splitting long live ranges on dependence cycles into shorter

ones, by inserting copy instructions.

Optimal methods for software pipelining based on integer linear programming

have been proposed e.g. by Badia et al. [ 8 ] , Govindarajan et al. [ 46 ] and Yang et al.

[ 97 ] . Combinations of modulo scheduling with other code generation tasks will be

discussed in Sect. 8 .

Software pipelining is often combined with loop unrolling. Especially if the

lower bound MinII is a non-integer value, loop unrolling before software pipelining

can improve throughput. Moreover, loop unrolling reduces loop overhead (at least

on processors that do not have hardware support for zero-overhead loops). The

downside is larger code size.

7.3

Global Instruction Scheduling

Basic blocks are the units of (procedure-)global control flow analysis. The basic

block graph of a program is a directed graph where the nodes correspond to the

basic blocks and edges show control flow transitions between basic blocks, such as

branches or fall-through transitions to branch targets.

Global instruction scheduling methods consider several basic blocks at a time and

allow to move instructions between basic blocks. The (current) scope of a global

scheduling method is referred to as a region . Regions used for global scheduling

include traces, superblocks, hyperblocks and treegions. Local scheduling methods

are extended to address entire regions. Because the scope is larger, global scheduling

has more flexibility and may generate better code than local scheduling. Program

transformations such as function inlining, loop unrolling or predication can be

applied to additionally increase the size of basic blocks and regions.

Theideaof trace scheduling [ 36 ] is to make the most frequently executed

control flow paths fast while accepting possible performance degradations along less

frequently used paths. Execution frequencies are assigned to the outgoing edges at

branch instructions based on static predictions or on profile data. A trace is a linear

path (i.e., free of backwards edges and thereby of loops) through the basic block

graph, where, at each basic block B i in the trace except for the last one, its successor

Signal Processing Systems

Search WWH ::

Custom Search

Home