Digital Signal Processing Reference
In-Depth Information
Pipeline adder
x 2n
y 2n
rst_n
Pipeline multiplier
rst_n
a 0
x[n]
y[n]
y[n]
rst_n
x[n]
rst_n
rst_n
rst_n
rst_n
rst_n
rst_n
a 1
rst_n
x 2n+ 1
y 2n+1
rst_n
rst_n
rst_n
rst_n
rst_n
L b
a 0
rst_n
rst_n
rst_n
rst_n
rst_n
a 0
rst_n
a 0
L 1
rst_n
rst_n
rst_n
a 1
a 1
rst_n
a 1
(a)
(b)
(c)
Figure 8.10 Unfolding and retiming of a feedback DFG. (a) Recursive DFG with seven algorithmic
registers. (b) Retiming of resisters for associating algorithmic registers with computational nodes for
effective unfolding. (c) Unfolded design for optimal utilization of algorithmic registers
unfolding factor J. This increase is because, although all the computational nodes are replicated J
times, still the number of registers in the unfolded DFG remains the same. For feedback designs,
unfolding may be effective for design instances where there are abundant algorithmic registers for
pipelining the combinational nodes in the design. In these designs, unfolding followed by retiming
provides flexibility of placing these algorithmic registers in the unfolded design while optimizing
timing. Similarly for feedforward designs, first pipeline registers are added and the design is then
unfolded and retimed for effective placement of registers, as explained in Section 8.4.5.
The registers in DFGs can be retimed for effective pipelining of the combinational cloud. In
cases where the designer is using embedded computational units or computational units with
limited pipeline support, there may exist extra registers that are not used for reducing the critical
path of the design. In these designs the critical path is greater than IPB. For example, the designer
might intend to use already embedded building blocks on an FPGA like DSP48. These blocks
have a fixed pipeline option and extra registers do not help in achieving the IPB. By unfolding and
retiming, the unfolded design can be appropriately mapped on the embedded blocks to effectively
use all the registers.
Figure 8.10(a) shows a design with seven algorithmic registers. The registers can be retimed such
that each computational unit has two registers to be used as pipeline registers, as shown in
Figure 8.10(b). The design is unfolded and registers are retimed for optimal HW mapping, as
shown in Figure 8.10(c). The RTL Verilog code of the three designs is listed here:
/* IIR filter of Fig. 8.10(a), having excessive
algorithmic registers /
module IIRFilter
(
input clk, rst_n,
input signed [15:0] xn, //Q1.15
 
Search WWH ::




Custom Search