Unfolding and Folding of Architectures - Digital Design of Signal Processing Systems: A Practical Approach

Digital Signal Processing Reference

In-Depth Information

add_reg[1] <= mul_reg[1]+add_reg[0];

add_reg[2] <= mul_reg[2]+add_reg[1];

add_reg[3] <= mul_reg[3]+add_reg[2];

mul_reg[0] <= xn*h3;

mul_reg[1] <= xn*h2;

mul_reg[2] <= xn*h1;

mul_reg[3] <= xn*h0;

end

// Full-precision output

assign yn_f = mul_reg[3];

// Quantizing to Q1.15

assign yn = yn_f[31:16];

endmodule

When the requirement on throughput is almost twice what is achieved through one stage of

pipelining and mapping on DSP48, will obviously not improve the throughput any further. In these

cases unfolding can become very handy. The designer can add pipeline registers as shown in

Figure 8.9(a). The number of pipeline registers should be such that each computational unit must

have two sets of registers. The DFG is unfolded and the registers are then retimed and appropriately

placed for mapping onDSP48 blocks. In the example shown the pipeline DFG is unfolded by a factor

of 2. Each pipelined MAC unit of the unfolded design can then be mapped on a DSP48 where the

architecture processes two input samples at a time. The pipeline DFG and its unfolded design are

shown in Figure 8.9(b). The mapping on DSP48 units is also shown with boxes with one box shaded

in gray for easy identification. The RTL Verilog code of the design is given here:

/* Pipelining then unfolding for effective mapping on

DSP48-based FPGAs with twice speedup /

module FIRFilterUnfold

(

input clk,

input signed [15:0] xn1, xn2, // Two inputs in Q1.15

output signed [15:0] yn1, yn2); // Two outputs in Q1.15

// All coefficients of the FIR filter are in Q1.15 format

parameter signed [15:0] h0 = 16 ' b1001100110111011;

parameter signed [15:0] h1 = 16 ' b1010111010101110;

parameter signed [15:0] h2 = 16 ' b0101001111011011;

parameter signed [15:0] h3 = 16 ' b0100100100101000;

// Input sample tap delay line for unfolding design

reg signed [15:0] xn_reg[0:4];

// Pipeline registers for first layer of multipliers

reg signed [31:0] mul_reg1[0:3];

// Pipeline registers for first layer of adders

reg signed [31:0] add_reg1[0:3];

// Pipeline registers for second layer of multipliers

reg signed [31:0] mul_reg2[0:3];

// Pipeline registers for second layer of adders

reg signed [31:0] add_reg2[0:3];

// Temporary wires for first layer of multiplier results

wire signed [31:0] mul_out1[0:3];

// Temporary wires for second layer of multiplication results

wire signed [31:0] mul_out2[0:3];

Digital Design of Signal Processing Systems: A Practical Approach

Search WWH ::

Custom Search

Home