Digital Signal Processing Reference
In-Depth Information
The first stage, prolog, contains instructions to build the second-stage loop cycle, and
the epilog stage (last stage) contains instructions to finish all loop iterations. Soft-
ware pipelining is used by the compiler when the optimization option level -o2 or
-o3 is invoked. The most efficient software pipelined code has loop trip counters
that count down: for example,
for (i = N; i != 0; i--)
A dot product example with word-wide hand-coded pipelined code results in ( N /2)
+
8 cycles to obtain the sum of two arrays, with N numbers in each array. This trans-
lates to 108 cycles to find the sum of products of 200 numbers, as illustrated in
Chapter 8. This efficiency is obtained using instructions such as LDW to load a
32-bit word and multiplying the lower and higher 16-bit numbers separately with
the two instructions mpy and mpyh , respectively.
Removing the epilog section can also reduce the code size. The available options
- msn ( n
0, 1, 2) directs the compiler to favor code size reduction over perfor-
mance. Hand-coded software pipelined code can be produced by first drawing a
dependency graph and setting up a scheduling table [8]. In Chapter 8 we discuss
software pipelining in conjunction with code efficiency.
=
3.20 CONSTRAINTS
3.20.1 Memory Constraints
Internal memory is arranged through various banks of memory so that loads and
stores can occur simultaneously. Since each bank of memory is single-ported, only
one access to each bank is performed per cycle. Two memory accesses per cycle can
be performed if they do not access the same bank of memory. If multiple accesses
are performed to the same bank of memory (within the same space), the pipeline
will stall. This causes additional cycles for execution to complete.
3.20.2 Cross-Path Constraints
Since there is one cross-path in each side of the two data paths, there can be at most
two instructions per cycle using cross-paths. The following code segment is valid
since both available cross-paths are used:
ADD .L1x A1,B1,A0
|| MPY .M2x A2,B2,B3
whereas the following is not valid since one cross-path is used for both instructions:
ADD .L1x A1,B1,A0
|| MPY .M1x A2,B2,A3
Search WWH ::




Custom Search