Code Optimization - Digital Signal Processing and Applications with the C6713 and C6416 DSK

Digital Signal Processing Reference

In-Depth Information

be performed with all C6x devices, whereas a floating-point implementation

requires a C67x platform such as the C6713 DSK.

The loop iterates 200 times. With a fixed-point implementation, each pointer reg-

ister A4 and A8 increments to point at the next half-word (16 bits) in each buffer,

whereas with a floating-point implementation, a pointer register increments the

pointer to the next 32-bit word. The load, multiply, and branch instructions must use

the .D , .M , and .S units, respectively; the add and subtract instructions can use any

unit (except .M ). The instructions within the loop consume 16 cycles per iteration.

This yields 16

3200 cycles. Table 8.4 shows a summary of several optimiza-

tion schemes for both fixed- and floating-point implementations.

¥

200

=

Example 8.6: Dot Product with Parallel Instructions for Fixed-Point

Implementation Using ASM Code ( dotpp )

Figure 8.6 shows the ASM code dotpp.asm for the dot product with a fixed-point

implementation with instructions in parallel. With code in lieu of NOP s, the number

of NOP s is reduced.

The MPY instruction uses a cross-path (with .M1x ) since the two operands are

from different register files or different paths. The instructions SUB and B are moved

up to fill some of the delay slots required by LDH. The branch instruction occurs

after the ADD instruction. Using parallel instructions, the instructions within the loop

now consume eight cycles per iteration, to yield 8

¥

200

=

1600 cycles.

Example 8.7: Two Sums of Products with Word-Wide (32-Bit) Data for

Fixed-Point Implementation Using ASM Code ( twosumfix )

Figure 8.7 shows the ASM code twosumfix.asm , which calculates two separate

sums of products using word-wide access of data for a fixed-point implementation.

The loop count is initialized to 100 (not 200) since two sums of products are obtained

; dotpp.asm ASM Code with parallel instructions, fixed-point

MVK .S1 200, A1 ;count into A1

|| ZERO .L1 A7 ;init A7 for accum

LOOP LDH .D1 *A4++,A2 ;A2=16-bit data pointed by A4

|| LDH .D2 *B4++,B2 ;B2=16-bit data pointed by B4

SUB .S1 A1,1,A1 ;decrement count

[A1] B

.S1 LOOP

;branch to LOOP (after ADD)

NOP

2

;delay slots for LDH and B

MPY .M1x A2,B2,A6

;product in A6

NOP

;1 delay slot for MPY

ADD .L1 A6,A7,A7

;accum in A7,then branch

;branch occurs here

FIGURE 8.6. ASM code with parallel instructions for fixed-point implementation.

Digital Signal Processing and Applications with the C6713 and C6416 DSK

Search WWH ::

Custom Search

Home