Unfolding and Folding of Architectures - Digital Design of Signal Processing Systems: A Practical Approach

Digital Signal Processing Reference

In-Depth Information

8

Unfolding and Folding

of Architectures

8.1 Introduction

Major decisions in digital design are based on the ratio of sampling clock to circuit clock. The

sampling clock is specific to an application and is derived from the Nyquist sampling criteria or

band-pass sampling constraint. The circuit clock, on the other hand, primarily depends on the design

and the technology used for implementation. In many high-end applications, the main focus of

design is to run the circuit at the highest possible clock rate to get the desired throughput. If a simple

mapping of the dataflow graph (DFG) on hardware cannot be synthesized at the required clock rate,

the designer opts to use several techniques.

In feedforward designs, an unfolding transformation makes parallel processing possible. This

results in an increase in throughput. Pipelining is another option in feedforward design for better

timing. Pipelining is usually the option of choice as it results in a smaller area than with an unfolded

design. In feedback DFGs, the unfolding transformation does not result in true parallel processing as

the circuit clock rate needs to be reduced and hence does not give any potential iteration period

bound (IPB) improvement. The only benefit of the unfolding transformation is that the circuit can be

run at slower clock as each register is slower by the unfolding factor. In FPGA-based design, with a

fixed number of registers and embedded computational units, unfolding helps in optimizing designs

that require too many algorithmic registers. The design is first unfolded and then the excessive

registers are retimed to give better timing. The chapter presents designs of FIR and IIR filters where

unfolding and then retiming achieves better performance.

In contrast to dedicated or parallel architectures, time-shared architectures are designed in

instances where the circuit clock is at least twice as fast as the sampling clock. The design running at

circuit clock speed can reuse its hardware resources, as the input data remains valid for multiple

circuit clocks. For many applications the designer can easily come up with a time-shared

architecture. The chapter describes several examples to highlight these design issues.

For many applications, designing an optimal time-shared architecture may not be simple. This

chapter covers mathematical transformation techniques for folding in time-multiplexed architec-

tures. These transformations take the DFG representation of a synchronous digital signal processing

(DSP) algorithm, a folding factor along with schedule of folding and then they systematically

Search WWH ::

Custom Search

Home