Digital Signal Processing Reference
In-Depth Information
would be 41.8 Gflops/s for 60 users [ 68 ] , not to mention 100 Mbps
3GPP LTE
system. This complexity largely exceeds the capability of nowadays DSP processors
which typically can provide 1-5 Gflops performance, such as 1.5 Gflops TI C6711
DSP processor and 1.8 Gflops ADI TigerSHARC Processor. In other cases, the
functionality required by the workload is not efficiently supported by more general-
purpose instruction sets typically found in embedded systems. As such the need
for acceleration at both the fine grain and coarse grain levels is often required, the
former for instruction set architecture (ISA) like optimization, and the latter for task
like optimization [ 17 ] .
Additionally, wireless system designers often desire the programmability offered
by software running on a DSP core versus a hardware based accelerator, to allow
flexibility in various proprietary algorithms. Examples of this can be functionality
such as channel estimation in baseband processing, for which a given vendor
may want to use their own algorithm to handle various users in varying system
conditions versus a pre-packaged solution. Typically these demands result in a
heterogeneous system which may include one or more of the following: software
programmable DSP cores for data processing, hardware based accelerator engines
for data processing, and in some instances general-purpose processors or micro-
controller type solutions for control processing.
The motivations for heterogeneous DSP system solutions including hardware
acceleration stem from the tradeoffs between software programmability versus the
performance gains of custom hardware acceleration in its various forms. There are a
number of heterogenous accelerator based architectures currently available today,
as well as various offerings and design solutions being offered by the research
community.
There are a number of DSP architectures which include true hardware based
accelerators which are not programmable by the end user. Examples of this include
the Texas Instruments' C55x and C64x series of DSPs which include hardware
based Viterbi or Turbo decoder accelerators for acceleration of wireless channel
decoding [ 64 , 65 ] .
+
1.1
Coarse Grain Versus Fine Grain Accelerator Architectures
Coarse-grain accelerator based DSP systems entail a co-processor type design
whereby larger amounts of work are run on the sometimes configurable co-
processor device. Current technologies being offered in this area support offloading
of functionality such as FFT and various matrix-like computations to the accelerator
versus executing in software on the programmable DSP core.
As shown in Fig. 1 , coarse grained heterogeneous architectures typically include
a loosely coupled computational grid attached to the host processor. These types of
architectures are sometimes built using an FPGA, ASIC, or vendor programmable
acceleration engine for portions of the system. Tightly coupled loop nests or kernels
are then offloaded from executing in software on the host processor to executing in
hardware on the loosely coupled grid.
Search WWH ::




Custom Search