Digital Signal Processing Reference
In-Depth Information
TABLE 8.1 Schedule Table of Dot Product Before Software Pipelining
for Fixed-Point Implementation
Cycles
Units
1, 9, . . .
2, 10, . . .
3, 11, . . .
4, 12, . . .
5, 13, . . .
6, 14, . . .
7, 15, . . .
8, 16, . . .
.D1
LDW
.D2
LDW
.M1
MPY
.M2
MPYH
.L1
ADD
.L2
ADD
.S1
SUB
.S2
B
TABLE 8.2 Schedule Table of Dot Product After Software Pipelining
for Fixed-Point Implementation
Loop
Prolog
Kernel
Cycles
Units
1
2
3
4
5
6
7
8
.D1
LDW
LDW
LDW
LDW
LDW
LDW
LDW
LDW
.D2
LDW
LDW
LDW
LDW
LDW
LDW
LDW
LDW
.M1
MPY
MPY
MPY
.M2
MPYH
MPYH
MPYH
.L1
ADD
.L2
ADD
.S1
SUB
SUB
SUB
SUB
SUB
SUB
SUB
.S2
B
B
B
B
B
B
From Table 8.1, the two
LDW
instructions are in parallel and are issued in cycles 1,
9,17,...The
SUB
instruction is issued in cycles 2, 10, 18,...This is followed by the
branch (
B
) instruction issued in cycles 3, 11, 19,...The two parallel instructions
MPY
and
MPYH
are issued in cycles 6, 14, 22,...The
ADD
instructions are issued in cycles
8, 16, 24, . . .
Table 8.1 is extended to illustrate the different stages: prolog (cycles 1 through
7), loop kernel (cycle 8), and epilog (cycles 9, 10,...not shown), as shown in Table
8.2. The instructions within the prolog stage are repeated until and including the
loop kernel (cycle) stage. Instructions in the epilog stage (cycles 9, 10, . . .) complete
the functionality of the code.
From Table 8.2, an efficient optimized code can be obtained. Note that it is
possible to start processing a new iteration before previous iterations are finished.
Software pipelining allows us to determine when to start a new loop iteration.
Search WWH ::
Custom Search