C Compilers and Code Optimization for DSPs - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

3.1.2

Offset Assignment

Address generation units (AGU) perform address calculations in parallel to other

on-going computations of the DSP. Efficient utilization of these dedicated functional

units depends on their effective use by the compiler. For example, an AGU might

compute the address of the next data item to be accessed by adding a constant (small)

modifier (e.g.

1) to the current data address and keeping the new address

ready for the next memory operation to come. This, however, is only possible if the

next data item can be placed within this range.

Placing variables in memory in such a way that for a given sequence of memory

accesses the best use of parallel address computations can be made is one of the

possible compiler optimizations targeting AGUs. The goal of offset assignment is

then “to generate a data memory layout such that the number of address offsets

of two consecutive memory accesses that are outside the auto-modify range is

minimized” [ 47 ] .

For example, consider the following sequence of accesses to the variables a , b ,

c , d , e ,and f : S

+

1or

−

= {

a

,

b

,

c

,

d

,

e

,

f

,

a

,

d

,

a

,

d

,

a

,

c

,

d

,

f

,

a

,

d

}

. Assuming an auto-modify

range of

1, i.e. address calculations refering the data item stored at the address

immediately before or after the current item are free, and a single index register we

compute the addressing costs for two different memory layout schemes in Fig. 3 .

Each time the index register needs to be written because of an access to an element

not neighboring the current position a cost of 1 is incurred (= dotted arrows). For the

naıve scheme shown in the upper half of the diagram the total addressing costs are

9 whereas in the improved memory layout shown in the bottom half of the diagram

the total costs have been reduced to 4.

In general, the offset assignment problem has a large solution space and can

be optimally solved for small instances only. Several heuristics that have been

developed are based on finding a maximum-weighted Hamiltonian path in the access

graph by using a greedy edge selection strategy [ 47 ] .

±

3.2

Control Flow Optimizations

Control flow optimizations such as the elimination of branches to unconditional

branches, branches to the next basic block, branches to the same target and many

more are part of every optimizing compiler. However, some particular architectural

features of digital signal processors such as zero overhead loops and predicated

instructions require specialized control flow optimizations.

3.2.1

Zero Overhead Loops

Zero overhead loops are an architectural feature found in many DSPs (see [ 45 ] )

that help implementing efficient code for loops with iteration counts known at

Signal Processing Systems

Search WWH ::

Custom Search

Home