Digital Signal Processing Reference
In-Depth Information
; twosumfloat.asm ASM Code with two sums of products for floating-pt
MVK .S1 100, A1 ;count/2 into A1
|| ZERO .L1 A7 ;init A7 for accum of even terms
|| ZERO .L2 B7 ;init B7 for accum of odd terms
LOOP LDDW .D1 *A4++,A3:A2 ;64-bit-> register pair A2,A3
|| LDDW .D2 *B4++,B3:B2 ;64-bit-> register pair B2,B3
SUB .S1 A1,1,A1 ;decrement count
NOP 2
;delay slots for LDW
[A1] B
.S2 LOOP
;branch to LOOP
MPYSP .M1x A2,B2,A6
;lower 32-bit product in A6
|| MPYSP .M2x A3,B3,B6
;upper 32-bit product in B6
NOP
3
;3 delay slot for MPYSP
ADDSP .L1 A6,A7,A7
;accum even terms in A7
|| ADDSP .L2 B6,B7,B7
;accum odd terms in B7
;branch occurs here
NOP 3 ;delay slots for last ADDSP
ADDSP .L1x A7,B7,A4 ;final sum of even and odd terms
NOP 3 ;delay slots for ADDSP
FIGURE 8.10. ASM code with two sums of products for floating-point implementation
( twosumfloat.asm ).
B7. The instructions within the loop consume a total of 10 cycles, using 100 itera-
tions (not 200), to yield a total of 10
¥
100
=
1000 cycles.
8.5 SOFTWARE PIPELINING FOR CODE OPTIMIZATION
Software pipelining is a scheme to write efficient code in ASM so that all the func-
tional units are utilized within one cycle. Optimization levels -o2 and -o3 enable
code generation to generate (or attempt to generate) software-pipelined code.
There are three stages associated with software pipelining:
1. Prolog (warm-up). This stage contains instructions needed to build up the loop
kernel (cycle).
2. Loop kernel (cycle). Within this loop, all instructions are executed in parallel.
The entire loop kernel can be executed in one cycle, since all the instructions
within the loop kernel stage are in parallel.
3. Epilog (cool-off ). This stage contains the instructions necessary to complete
all iterations.
8.5.1 Procedure for Hand-Coded Software Pipelining
1. Draw a dependency graph.
2. Set up a scheduling table.
3. Obtain code from the scheduling table.
Search WWH ::




Custom Search