Digital Signal Processing Reference
In-Depth Information
3.1.2
Offset Assignment
Address generation units (AGU) perform address calculations in parallel to other
on-going computations of the DSP. Efficient utilization of these dedicated functional
units depends on their effective use by the compiler. For example, an AGU might
compute the address of the next data item to be accessed by adding a constant (small)
modifier (e.g.
1) to the current data address and keeping the new address
ready for the next memory operation to come. This, however, is only possible if the
next data item can be placed within this range.
Placing variables in memory in such a way that for a given sequence of memory
accesses the best use of parallel address computations can be made is one of the
possible compiler optimizations targeting AGUs. The goal of offset assignment is
then “to generate a data memory layout such that the number of address offsets
of two consecutive memory accesses that are outside the auto-modify range is
minimized” [ 47 ] .
For example, consider the following sequence of accesses to the variables a , b ,
c , d , e ,and f : S
+
1or
= {
a
,
b
,
c
,
d
,
e
,
f
,
a
,
d
,
a
,
d
,
a
,
c
,
d
,
f
,
a
,
d
}
. Assuming an auto-modify
range of
1, i.e. address calculations refering the data item stored at the address
immediately before or after the current item are free, and a single index register we
compute the addressing costs for two different memory layout schemes in Fig. 3 .
Each time the index register needs to be written because of an access to an element
not neighboring the current position a cost of 1 is incurred (= dotted arrows). For the
naıve scheme shown in the upper half of the diagram the total addressing costs are
9 whereas in the improved memory layout shown in the bottom half of the diagram
the total costs have been reduced to 4.
In general, the offset assignment problem has a large solution space and can
be optimally solved for small instances only. Several heuristics that have been
developed are based on finding a maximum-weighted Hamiltonian path in the access
graph by using a greedy edge selection strategy [ 47 ] .
±
3.2
Control Flow Optimizations
Control flow optimizations such as the elimination of branches to unconditional
branches, branches to the next basic block, branches to the same target and many
more are part of every optimizing compiler. However, some particular architectural
features of digital signal processors such as zero overhead loops and predicated
instructions require specialized control flow optimizations.
3.2.1
Zero Overhead Loops
Zero overhead loops are an architectural feature found in many DSPs (see [ 45 ] )
that help implementing efficient code for loops with iteration counts known at
 
Search WWH ::




Custom Search