Code Optimization - Digital Signal Processing and Applications with the C6713 and C6416 DSK

Digital Signal Processing Reference

In-Depth Information

; twosumfix.asm ASM code for two sums of products with word-wide data

;for fixed-point implementation

MVK .S1 100, A1

;count/2 into A1

||

ZERO .L1 A7

;init A7 for accum of even terms

||

ZERO .L2 B7

;init B7 for accum of odd terms

LOOP

LDW .D1 *A4++,A2

;A2=32-bit data pointed by A4

||

LDW .D2 *B4++,B2

;A3=32-bit data pointed by B4

SUB .S1 A1,1,A1

;decrement count

[A1] B

.S1 LOOP

;branch to LOOP (after ADD)

NOP 2

;delay slots for both LDW and B

MPY .M1x A2,B2,A6

;lower 16-bit product in A6

|| MPYH .M2x A2,B2,B6

;upper 16-bit product in B6

NOP

;1 delay slot for MPY/MPYH

ADD .L1 A6,A7,A7

;accum even terms in A7

||

ADD .L2 B6,B7,B7

;accum odd terms in B7

;branch occurs here

FIGURE 8.7. ASM code for two sums of products with 32-bit data for fixed-point

implementation ( twosumfix.asm ).

per iteration. The instruction LDW loads a word or 32-bit data. The multiply instruc-

tion MPY finds the product of the lower 16

¥

16 data, and MPYH finds the product of

the upper 16

16 data. The two ADD instructions accumulate separately the even

and odd sums of products. Note that an additional ADD instruction is needed outside

the loop to accumulate A7 and B7. The instructions within the loop consume eight

cycles, now using 100 iterations (not 200), to yield 8

¥

100

=

800 cycles.

Example 8.8: Dot Product with No Parallel Instructions for Floating-Point

Implementation Using ASM Code ( dotpnpfloat )

Figure 8.8 shows the ASM code dotpnpfloat.asm for the dot product with a

floating-point implementation using no instructions in parallel. The loop iterates 200

; dotpnpfloat.asm ASM Code with no parallel instructions for floating-pt

MVK

.S1 200, A1

;count into A1

ZERO .L1 A7

;init A7 for accum

LOOP LDW

.D1 *A4++,A2

;A2=32-bit data pointed by A4

LDW .D1 *A8++,A3

;A3=32-bit data pointed by A8

NOP

4

;4 delay slots for LDW

MPYSP

.M1 A2,A3,A6

;product in A6

NOP

3

;3 delay slots for MPYSP

ADDSP .L1 A6,A7,A7

;accum in A7

SUB

.S1 A1,1,A1

;decrement count

[A1] B

.S2 LOOP

;branch to LOOP

NOP 5 ;5 delay slots for B

FIGURE 8.8. ASM code with no parallel instructions for floating-point implementation

( dotpnpfloat.asm ).

Digital Signal Processing and Applications with the C6713 and C6416 DSK

Search WWH ::

Custom Search

Home