Digital Signal Processing Reference
In-Depth Information
add_reg[1] <= mul_reg[1]+add_reg[0];
add_reg[2] <= mul_reg[2]+add_reg[1];
add_reg[3] <= mul_reg[3]+add_reg[2];
mul_reg[0] <= xn*h3;
mul_reg[1] <= xn*h2;
mul_reg[2] <= xn*h1;
mul_reg[3] <= xn*h0;
end
// Full-precision output
assign yn_f = mul_reg[3];
// Quantizing to Q1.15
assign yn = yn_f[31:16];
endmodule
When the requirement on throughput is almost twice what is achieved through one stage of
pipelining and mapping on DSP48, will obviously not improve the throughput any further. In these
cases unfolding can become very handy. The designer can add pipeline registers as shown in
Figure 8.9(a). The number of pipeline registers should be such that each computational unit must
have two sets of registers. The DFG is unfolded and the registers are then retimed and appropriately
placed for mapping onDSP48 blocks. In the example shown the pipeline DFG is unfolded by a factor
of 2. Each pipelined MAC unit of the unfolded design can then be mapped on a DSP48 where the
architecture processes two input samples at a time. The pipeline DFG and its unfolded design are
shown in Figure 8.9(b). The mapping on DSP48 units is also shown with boxes with one box shaded
in gray for easy identification. The RTL Verilog code of the design is given here:
/* Pipelining then unfolding for effective mapping on
DSP48-based FPGAs with twice speedup /
module FIRFilterUnfold
(
input clk,
input signed [15:0] xn1, xn2, // Two inputs in Q1.15
output signed [15:0] yn1, yn2); // Two outputs in Q1.15
// All coefficients of the FIR filter are in Q1.15 format
parameter signed [15:0] h0 = 16 ' b1001100110111011;
parameter signed [15:0] h1 = 16 ' b1010111010101110;
parameter signed [15:0] h2 = 16 ' b0101001111011011;
parameter signed [15:0] h3 = 16 ' b0100100100101000;
// Input sample tap delay line for unfolding design
reg signed [15:0] xn_reg[0:4];
// Pipeline registers for first layer of multipliers
reg signed [31:0] mul_reg1[0:3];
// Pipeline registers for first layer of adders
reg signed [31:0] add_reg1[0:3];
// Pipeline registers for second layer of multipliers
reg signed [31:0] mul_reg2[0:3];
// Pipeline registers for second layer of adders
reg signed [31:0] add_reg2[0:3];
// Temporary wires for first layer of multiplier results
wire signed [31:0] mul_out1[0:3];
// Temporary wires for second layer of multiplication results
wire signed [31:0] mul_out2[0:3];
Search WWH ::




Custom Search