Code Optimization - Digital Signal Processing and Applications with the C6713 and C6416 DSK

Digital Signal Processing Reference

In-Depth Information

Example 8.12: Dot Product Using Software Pipelining for

a Floating-Point Implementation

This example implements the dot product using software pipelining for a floating-

point implementation. Table 8.3 shows a floating-point version of Table 8.2. LDW

becomes LDDW , MPY/MPYH become MPYSP , and ADD becomes ADDSP . Both MPYSP

and ADDSP have three delays slots. As a result, the loop kernel starts in cycle 10 in

lieu of cycle 8. The SUB and B instructions start in cycles 4 and 5, respectively, in

lieu of cycles 2 and 3. ADDSP starts in cycle 10 in lieu of cycle 8. The software pipeline

for a floating-point implementation is 10 deep.

Figure 8.14 shows the ASM code dotpipedfloat.asm , which implements the

floating-point version of the dot product. Since ADDSP has three delay slots,

the accumulation is staggered by four. The accumulation associated with one of the

ADDSP instructions at each loop cycle follows:

Loop Cycle

Accumulator (one ADDSP )

1

0

2

0

3

0

4

0

5

;first product

p0

6

;second product

p1

7

p3

8

p4

9

;sum of first and fifth products

p0 + p4

10

;sum of second and sixth products

p1 + p5

11

p2 + p6

12

p3 + p7

13

;sum of first, fifth, and ninth p roducts

p0 + p4 + p8

14

p1 + p5 + p9

15

p2 + p6 + p10

16

p3 + p7 + p11

17

p0 + p4 + p8 + p12

.

99

p2 + p6 + p10 +... + p94

100

p3 + p7 + p11 + ... + p95

This accumulation is shown associated with the loop cycle. The actual cycle is

shifted by 9 (by the cycles in the prolog section). Note that the first product, p0 ,is

Digital Signal Processing and Applications with the C6713 and C6416 DSK

Search WWH ::

Custom Search

Home