Code Optimization - Digital Signal Processing and Applications with the C6713 and C6416 DSK

Digital Signal Processing Reference

In-Depth Information

; twosumfloat.asm ASM Code with two sums of products for floating-pt

MVK .S1 100, A1 ;count/2 into A1

|| ZERO .L1 A7 ;init A7 for accum of even terms

|| ZERO .L2 B7 ;init B7 for accum of odd terms

LOOP LDDW .D1 *A4++,A3:A2 ;64-bit-> register pair A2,A3

|| LDDW .D2 *B4++,B3:B2 ;64-bit-> register pair B2,B3

SUB .S1 A1,1,A1 ;decrement count

NOP 2

;delay slots for LDW

[A1] B

.S2 LOOP

;branch to LOOP

MPYSP .M1x A2,B2,A6

;lower 32-bit product in A6

|| MPYSP .M2x A3,B3,B6

;upper 32-bit product in B6

NOP

3

;3 delay slot for MPYSP

ADDSP .L1 A6,A7,A7

;accum even terms in A7

|| ADDSP .L2 B6,B7,B7

;accum odd terms in B7

;branch occurs here

NOP 3 ;delay slots for last ADDSP

ADDSP .L1x A7,B7,A4 ;final sum of even and odd terms

NOP 3 ;delay slots for ADDSP

FIGURE 8.10. ASM code with two sums of products for floating-point implementation

( twosumfloat.asm ).

B7. The instructions within the loop consume a total of 10 cycles, using 100 itera-

tions (not 200), to yield a total of 10

¥

100

=

1000 cycles.

8.5 SOFTWARE PIPELINING FOR CODE OPTIMIZATION

Software pipelining is a scheme to write efficient code in ASM so that all the func-

tional units are utilized within one cycle. Optimization levels -o2 and -o3 enable

code generation to generate (or attempt to generate) software-pipelined code.

There are three stages associated with software pipelining:

1. Prolog (warm-up). This stage contains instructions needed to build up the loop

kernel (cycle).

2. Loop kernel (cycle). Within this loop, all instructions are executed in parallel.

The entire loop kernel can be executed in one cycle, since all the instructions

within the loop kernel stage are in parallel.

3. Epilog (cool-off ). This stage contains the instructions necessary to complete

all iterations.

8.5.1 Procedure for Hand-Coded Software Pipelining

1. Draw a dependency graph.

2. Set up a scheduling table.

3. Obtain code from the scheduling table.

Digital Signal Processing and Applications with the C6713 and C6416 DSK

Search WWH ::

Custom Search

Home