Digital Signal Processing Reference
In-Depth Information
8
Unfolding and Folding
of Architectures
8.1 Introduction
Major decisions in digital design are based on the ratio of sampling clock to circuit clock. The
sampling clock is specific to an application and is derived from the Nyquist sampling criteria or
band-pass sampling constraint. The circuit clock, on the other hand, primarily depends on the design
and the technology used for implementation. In many high-end applications, the main focus of
design is to run the circuit at the highest possible clock rate to get the desired throughput. If a simple
mapping of the dataflow graph (DFG) on hardware cannot be synthesized at the required clock rate,
the designer opts to use several techniques.
In feedforward designs, an unfolding transformation makes parallel processing possible. This
results in an increase in throughput. Pipelining is another option in feedforward design for better
timing. Pipelining is usually the option of choice as it results in a smaller area than with an unfolded
design. In feedback DFGs, the unfolding transformation does not result in true parallel processing as
the circuit clock rate needs to be reduced and hence does not give any potential iteration period
bound (IPB) improvement. The only benefit of the unfolding transformation is that the circuit can be
run at slower clock as each register is slower by the unfolding factor. In FPGA-based design, with a
fixed number of registers and embedded computational units, unfolding helps in optimizing designs
that require too many algorithmic registers. The design is first unfolded and then the excessive
registers are retimed to give better timing. The chapter presents designs of FIR and IIR filters where
unfolding and then retiming achieves better performance.
In contrast to dedicated or parallel architectures, time-shared architectures are designed in
instances where the circuit clock is at least twice as fast as the sampling clock. The design running at
circuit clock speed can reuse its hardware resources, as the input data remains valid for multiple
circuit clocks. For many applications the designer can easily come up with a time-shared
architecture. The chapter describes several examples to highlight these design issues.
For many applications, designing an optimal time-shared architecture may not be simple. This
chapter covers mathematical transformation techniques for folding in time-multiplexed architec-
tures. These transformations take the DFG representation of a synchronous digital signal processing
(DSP) algorithm, a folding factor along with schedule of folding and then they systematically
Search WWH ::




Custom Search