Digital Signal Processing Reference
In-Depth Information
Loop Kernel (Cycle)
Within the loop kernel, in cycle 8, each functional unit is used only once. The
minimum iteration interval is the minimum number of cycles required to wait before
the initiation of a successive iteration. This interval is 1. As a result, a new iteration
can be initiated every cycle.
Within loop cycle 8, multiple iterations of the loop execute in parallel. In cycle
8, different iterations are processed at the same time. For example, the ADD s add
data for iteration 1, while MPY and MPYH multiply data for iteration 3, LDW s load
data for iteration 8, SUB decrements the counter for iteration 7, and B branches for
iteration 6. Note that the values being multiplied are loaded into registers five cycles
prior to the cycle when the values are multiplied. Before the first multiplication
occurs, the fifth load has just completed. This software pipeline is eight iterations
deep.
Example 8.11: Dot Product Using Software Pipelining for
a Fixed-Point Implementation
This example implements the dot product using software pipelining for a fixed-point
implementation. From Table 8.2, one can readily obtained the ASM code dotpiped-
fix.asm shown in Figure 8.13. The loop count is 100 since two multiplies and two
accumulates are calculated per iteration. The following instructions start in the fol-
lowing cycles:
Cycle 1 : LDW , LDW (also initialization of count and accumulators A7 and B7)
Cycle 2 : LDW , LDW , SUB
Cycles 3-5 : LDW , LDW , SUB , B
Cycles 6-7 : LDW , LDW , MPY , MPYH , SUB , B
Cycles 8-107 : LDW , LDW , MPY , MPYH , ADD , ADD , SUB , B
Cycle 108 : LDW , LDW , MPY , MPYH , ADD , ADD , SUB , B
The prolog section is within cycles 1 through 7; the loop kernel is in cycle 8, where
all the instructions are in parallel; and the epilog section is in cycle 108. Note that
SUB is made conditional to ensure that Al is no longer decremented once it reaches
zero.
Search WWH ::




Custom Search