Digital Signal Processing Reference
In-Depth Information
; twosumfix.asm ASM code for two sums of products with word-wide data
;for fixed-point implementation
MVK .S1 100, A1
;count/2 into A1
||
ZERO .L1 A7
;init A7 for accum of even terms
||
ZERO .L2 B7
;init B7 for accum of odd terms
LOOP
LDW .D1 *A4++,A2
;A2=32-bit data pointed by A4
||
LDW .D2 *B4++,B2
;A3=32-bit data pointed by B4
SUB .S1 A1,1,A1
;decrement count
[A1] B
.S1 LOOP
;branch to LOOP (after ADD)
NOP 2
;delay slots for both LDW and B
MPY .M1x A2,B2,A6
;lower 16-bit product in A6
|| MPYH .M2x A2,B2,B6
;upper 16-bit product in B6
NOP
;1 delay slot for MPY/MPYH
ADD .L1 A6,A7,A7
;accum even terms in A7
||
ADD .L2 B6,B7,B7
;accum odd terms in B7
;branch occurs here
FIGURE 8.7. ASM code for two sums of products with 32-bit data for fixed-point
implementation ( twosumfix.asm ).
per iteration. The instruction LDW loads a word or 32-bit data. The multiply instruc-
tion MPY finds the product of the lower 16
¥
16 data, and MPYH finds the product of
the upper 16
16 data. The two ADD instructions accumulate separately the even
and odd sums of products. Note that an additional ADD instruction is needed outside
the loop to accumulate A7 and B7. The instructions within the loop consume eight
cycles, now using 100 iterations (not 200), to yield 8
¥
¥
100
=
800 cycles.
Example 8.8: Dot Product with No Parallel Instructions for Floating-Point
Implementation Using ASM Code ( dotpnpfloat )
Figure 8.8 shows the ASM code dotpnpfloat.asm for the dot product with a
floating-point implementation using no instructions in parallel. The loop iterates 200
; dotpnpfloat.asm ASM Code with no parallel instructions for floating-pt
MVK
.S1 200, A1
;count into A1
ZERO .L1 A7
;init A7 for accum
LOOP LDW
.D1 *A4++,A2
;A2=32-bit data pointed by A4
LDW .D1 *A8++,A3
;A3=32-bit data pointed by A8
NOP
4
;4 delay slots for LDW
MPYSP
.M1 A2,A3,A6
;product in A6
NOP
3
;3 delay slots for MPYSP
ADDSP .L1 A6,A7,A7
;accum in A7
SUB
.S1 A1,1,A1
;decrement count
[A1] B
.S2 LOOP
;branch to LOOP
NOP 5 ;5 delay slots for B
FIGURE 8.8. ASM code with no parallel instructions for floating-point implementation
( dotpnpfloat.asm ).
Search WWH ::




Custom Search