Digital Signal Processing Reference
In-Depth Information
wireless system, it is very important to develop area and power efficient 4G wireless
receivers. Given area and power constraints for the mobile handsets one can not
simply implement computation intensive DSP algorithms with gigahertz DSPs.
Besides, it is also critical to reduce base station power consumption by utilizing
optimized hardware accelerator design.
In this second edition, we will describe a few DSP algorithms which dominate the
main computational complexity in a wireless receiver. These algorithms, including
Viterbi decoding, Turbo decoding, LDPC decoding, MIMO detection, and channel
equalization/FFT, need to be off-loaded to hardware coprocessors or accelerators,
yielding high performance. These hardware accelerators are often integrated in the
same die with DSP processors. In addition, it is also possible to leverage the field-
programmable gate array (FPGA) to provide reconfigurable massive computation
capabilities, as described in other chapter of this handbook [ 40 ] .
DSP workloads are typically numerically intensive with large amounts of both
instruction and data level parallelism. In order to exploit this parallelism with a
programmable processor, most DSP architectures utilize Very Long Instruction
Word, or VLIW architectures. VLIW architectures typically include one or more
register files on the processor die, versus a single monolithic register file as
is often the case in general-purpose computing. Examples of such architectures
are the Freescale StarCore processor, the Texas Instruments TMS320C6x series
DSPs as well as SHARC DSPs from Analog Devices, to name a few [ 3 , 22 , 63 ] .
A comprehensive overview of the general-purpose DSP processors is given in
Chapter of this handbook [ 58 ] .
In some cases due to the idiosyncratic nature of many DSPs, and the imple-
mentation of some of the more powerful instructions in the DSP core, an optimizing
compiler can not always target core functionality in an optimal manner. Examples of
this include high performance fractional arithmetic instructions, for example, which
may perform highly SIMD functionality which the compiler can not always deem
safe at compile time.
While the aforementioned VLIW based DSP architectures provide increased
parallelism and higher numerical throughput performance, this comes at a cost
of ease in programmability. Typically such machines are dependent on advanced
optimizing compilers that are capable of aggressively analyzing the instruction and
data level parallelism in the target workloads, and mapping it onto the parallel
hardware. Due to the large number of parallel functional units and deep pipeline
depths, modern DSP are often difficult to hand program at the assembly level while
achieving optimal results. As such, one technique used by the optimizing compiler
is to vectorize much of the data level parallelism often found in DSP workloads. In
doing this, the compiler can often fully exploit the single instruction multiple data,
or SIMD functionality found in modern DSP instruction sets.
Despite such highly parallel programmable processor cores and advanced com-
piler technology, however, it is quite often the case that the amount of available
instruction and data level parallelism in modern signal processing workloads far
exceeds the limited resources available in a VLIW based programmable processor
core. For example, the implementation complexity for a 40 Kbps DS-CDMA system
Search WWH ::




Custom Search