Digital Signal Processing Reference
In-Depth Information
TABLE 8.1 Schedule Table of Dot Product Before Software Pipelining
for Fixed-Point Implementation
Cycles
Units
1, 9, . . .
2, 10, . . .
3, 11, . . .
4, 12, . . .
5, 13, . . .
6, 14, . . .
7, 15, . . .
8, 16, . . .
.D1
LDW
.D2
LDW
.M1
MPY
.M2
MPYH
.L1
ADD
.L2
ADD
.S1
SUB
.S2
B
TABLE 8.2 Schedule Table of Dot Product After Software Pipelining
for Fixed-Point Implementation
Loop
Prolog
Kernel
Cycles
Units
1
2
3
4
5
6
7
8
.D1
LDW
LDW
LDW
LDW
LDW
LDW
LDW
LDW
.D2
LDW
LDW
LDW
LDW
LDW
LDW
LDW
LDW
.M1
MPY
MPY
MPY
.M2
MPYH
MPYH
MPYH
.L1
ADD
.L2
ADD
.S1
SUB
SUB
SUB
SUB
SUB
SUB
SUB
.S2
B
B
B
B
B
B
From Table 8.1, the two LDW instructions are in parallel and are issued in cycles 1,
9,17,...The SUB instruction is issued in cycles 2, 10, 18,...This is followed by the
branch ( B ) instruction issued in cycles 3, 11, 19,...The two parallel instructions MPY
and MPYH are issued in cycles 6, 14, 22,...The ADD instructions are issued in cycles
8, 16, 24, . . .
Table 8.1 is extended to illustrate the different stages: prolog (cycles 1 through
7), loop kernel (cycle 8), and epilog (cycles 9, 10,...not shown), as shown in Table
8.2. The instructions within the prolog stage are repeated until and including the
loop kernel (cycle) stage. Instructions in the epilog stage (cycles 9, 10, . . .) complete
the functionality of the code.
From Table 8.2, an efficient optimized code can be obtained. Note that it is
possible to start processing a new iteration before previous iterations are finished.
Software pipelining allows us to determine when to start a new loop iteration.
Search WWH ::




Custom Search