Digital Signal Processing Reference
In-Depth Information
Example 8.12: Dot Product Using Software Pipelining for
a Floating-Point Implementation
This example implements the dot product using software pipelining for a floating-
point implementation. Table 8.3 shows a floating-point version of Table 8.2.
LDW
becomes
LDDW
,
MPY/MPYH
become
MPYSP
, and
ADD
becomes
ADDSP
. Both
MPYSP
and
ADDSP
have three delays slots. As a result, the loop kernel starts in cycle 10 in
lieu of cycle 8. The
SUB
and
B
instructions start in cycles 4 and 5, respectively, in
lieu of cycles 2 and 3.
ADDSP
starts in cycle 10 in lieu of cycle 8. The software pipeline
for a floating-point implementation is 10 deep.
Figure 8.14 shows the ASM code
dotpipedfloat.asm
, which implements the
floating-point version of the dot product. Since
ADDSP
has three delay slots,
the accumulation is staggered by four. The accumulation associated with one of the
ADDSP
instructions at each loop cycle follows:
Loop Cycle
Accumulator (one
ADDSP
)
1
0
2
0
3
0
4
0
5
;first product
p0
6
;second product
p1
7
p3
8
p4
9
;sum of first and fifth products
p0 + p4
10
;sum of second and sixth products
p1 + p5
11
p2 + p6
12
p3 + p7
13
;sum of first, fifth, and ninth
p
roducts
p0 + p4 + p8
14
p1 + p5 + p9
15
p2 + p6 + p10
16
p3 + p7 + p11
17
p0 + p4 + p8 + p12
.
.
.
.
.
.
99
p2 + p6 + p10
+...
+ p94
100
p3 + p7 + p11 +
...
+ p95
This accumulation is shown associated with the loop cycle. The actual cycle is
shifted by 9 (by the cycles in the prolog section). Note that the first product,
p0
,is
Search WWH ::
Custom Search