Digital Signal Processing Reference
In-Depth Information
a
b
for
(i = 0; i < 16; i ++)
{
for (i = 0; i < 16; i += 2)
{
/
i = 0, 1, 2, ...
/
/
i = 0, 2, 4, ...
/
c+=a[i]
b[i ];
c += a[i]
b[ i ];
}
/
i = 1, 3, 5, ...
/
c+=a[i+1]
b[ i +1];
}
Fig. 6
Example: loop unrolling. ( a ) Original loop. ( b ) Loop unrolled twofold
3.3.1
Loop Unrolling
Loop unrolling is probably the single-most important code transformation for
any DSP application [ 42 ] . As the name suggests this transformation unrolls a
loop, i.e. the loop body is replicated one or more times and the step of the loop
iteration variable is adjusted accordingly. A simple example where the loop body is
duplicated and the loop step doubled is shown in Fig. 6 .
DSP applications spent most of their time in small, but compute-intensive loops.
To improve overall performance, efficient hardware utilization is paramount and one
way of achieving this is to merge consecutive loop iterations. The main benefits
from loop unrolling arise from a reduced number of control flow instructions
(conditional branches) due to fewer iterations of the unrolled loop and better
scheduling opportunities due to the larger loop body. For example, the number of
iterations of the unrolled loop in Fig. 6 b is half of that of the original loop in Fig. 6 a .
As a consequence the overhead associated with the comparison with the upper loop
bound and the increment of the loop iteration variable is reduced by a factor of 2 . 1
At the same time the size of the loop body is doubled. This provides the instruction
scheduler with more instructions to hide the latencies of e.g. memory accesses and
to better exploit the available instruction level parallelism.
For small loops complete unrolling can be beneficial, i.e. the entire surrounding
loop is flattened. For larger loops—with either a large number of iterations or a large
number of instructions contained in the loop body—this is not practical because of
the resulting code bloat.
Determining the optimal loop unroll factor is difficult [ 32 , 44 ] , in particular, if
the target machine comprises an instruction cache. Code expansion resulting from
loop unrolling can degrade the performance of the instruction cache. The added
scheduling freedom can result in additional register pressure due to increased live
ranges of variables. The resulting memory spills and reloads may negate the benefits
of unrolling. In addition, control flow instructions in the loop body complicate
unrolling decisions for example, if the compiler cannot determine that a loop may
take an early exit.
1 This reduction may be less if the original loop is implemented using Zero-Overhead Loops ,
see [ 45 ] .
 
 
 
Search WWH ::




Custom Search