Digital Signal Processing Reference
In-Depth Information
be performed with all C6x devices, whereas a floating-point implementation
requires a C67x platform such as the C6713 DSK.
The loop iterates 200 times. With a fixed-point implementation, each pointer reg-
ister A4 and A8 increments to point at the next half-word (16 bits) in each buffer,
whereas with a floating-point implementation, a pointer register increments the
pointer to the next 32-bit word. The load, multiply, and branch instructions must use
the .D , .M , and .S units, respectively; the add and subtract instructions can use any
unit (except .M ). The instructions within the loop consume 16 cycles per iteration.
This yields 16
3200 cycles. Table 8.4 shows a summary of several optimiza-
tion schemes for both fixed- and floating-point implementations.
¥
200
=
Example 8.6: Dot Product with Parallel Instructions for Fixed-Point
Implementation Using ASM Code ( dotpp )
Figure 8.6 shows the ASM code dotpp.asm for the dot product with a fixed-point
implementation with instructions in parallel. With code in lieu of NOP s, the number
of NOP s is reduced.
The MPY instruction uses a cross-path (with .M1x ) since the two operands are
from different register files or different paths. The instructions SUB and B are moved
up to fill some of the delay slots required by LDH. The branch instruction occurs
after the ADD instruction. Using parallel instructions, the instructions within the loop
now consume eight cycles per iteration, to yield 8
¥
200
=
1600 cycles.
Example 8.7: Two Sums of Products with Word-Wide (32-Bit) Data for
Fixed-Point Implementation Using ASM Code ( twosumfix )
Figure 8.7 shows the ASM code twosumfix.asm , which calculates two separate
sums of products using word-wide access of data for a fixed-point implementation.
The loop count is initialized to 100 (not 200) since two sums of products are obtained
; dotpp.asm ASM Code with parallel instructions, fixed-point
MVK .S1 200, A1 ;count into A1
|| ZERO .L1 A7 ;init A7 for accum
LOOP LDH .D1 *A4++,A2 ;A2=16-bit data pointed by A4
|| LDH .D2 *B4++,B2 ;B2=16-bit data pointed by B4
SUB .S1 A1,1,A1 ;decrement count
[A1] B
.S1 LOOP
;branch to LOOP (after ADD)
NOP
2
;delay slots for LDH and B
MPY .M1x A2,B2,A6
;product in A6
NOP
;1 delay slot for MPY
ADD .L1 A6,A7,A7
;accum in A7,then branch
;branch occurs here
FIGURE 8.6. ASM code with parallel instructions for fixed-point implementation.
Search WWH ::




Custom Search