Digital Signal Processing Reference
In-Depth Information
Example 8.12: Dot Product Using Software Pipelining for
a Floating-Point Implementation
This example implements the dot product using software pipelining for a floating-
point implementation. Table 8.3 shows a floating-point version of Table 8.2. LDW
becomes LDDW , MPY/MPYH become MPYSP , and ADD becomes ADDSP . Both MPYSP
and ADDSP have three delays slots. As a result, the loop kernel starts in cycle 10 in
lieu of cycle 8. The SUB and B instructions start in cycles 4 and 5, respectively, in
lieu of cycles 2 and 3. ADDSP starts in cycle 10 in lieu of cycle 8. The software pipeline
for a floating-point implementation is 10 deep.
Figure 8.14 shows the ASM code dotpipedfloat.asm , which implements the
floating-point version of the dot product. Since ADDSP has three delay slots,
the accumulation is staggered by four. The accumulation associated with one of the
ADDSP instructions at each loop cycle follows:
Loop Cycle
Accumulator (one ADDSP )
1
0
2
0
3
0
4
0
5
;first product
p0
6
;second product
p1
7
p3
8
p4
9
;sum of first and fifth products
p0 + p4
10
;sum of second and sixth products
p1 + p5
11
p2 + p6
12
p3 + p7
13
;sum of first, fifth, and ninth p roducts
p0 + p4 + p8
14
p1 + p5 + p9
15
p2 + p6 + p10
16
p3 + p7 + p11
17
p0 + p4 + p8 + p12
.
.
.
.
.
.
99
p2 + p6 + p10 +... + p94
100
p3 + p7 + p11 + ... + p95
This accumulation is shown associated with the loop cycle. The actual cycle is
shifted by 9 (by the cycles in the prolog section). Note that the first product, p0 ,is
Search WWH ::




Custom Search