Digital Signal Processing Reference
In-Depth Information
complete implementation of the associated application. For example, for echo cancelling the
algorithm needs first to detect double-talk to find out whether coefficients should be updated or not.
The algorithms for double-talk detection are of a very different nature [18-20] than filtering or
coefficient adaptation. In addition the application also requires decision logic to implement one
or the other part of the algorithm. All these requirements favor a micro-programmed accelerator to
implement. This also augments the requisite flexibility of modifying the algorithm at the imple-
mentation stage of the design. The algorithm can also be modified and recoded at an any time in the
life cycle of the product.
For LEC-type applications, the computationally intensive part is still the filtering and coefficient
adaption as the filter length is quite large. The accelerator design is optimized to perform these
operations and the programmability helps in incorporating the auxiliary algorithms like voice
activity detection and state machine coding without adding any additional hardware. To illustrate
this methodology, the remainder of this section gives a detailed design of one such application.
11.6.2 Example: LEC Micro-coded Accelerator
A micro-coded accelerator is designed to implement adaptive filter applications and specifically to
perform LEC on multiple channels in a carrier-class VoIP gateway. The processor is primarily
optimized to execute a time-domain LMS-based adaptive LEC algorithmon a number of channels of
speech signals. The filter length for each of the channels is programmable and can be extended to 512
taps or more. As the sampling rate of speech is 8 kHz, these taps correspond to (512/8) ms of echo tail
length.
11.6.2.1 Top-level Design
The accelerator consists of a datapath and a controller. The datapath has twoMAC blocks capable of
performing four MAC operations every cycle, a logic unit, a barrel shifter, two address generation
units (AGUs) and two blocks of dual-ported data memories (DMs). TheMAC block can also be used
as two independent multipliers and two adders. The datapath has two sets of register files and a few
application-specific registers for maximizing reuse of data samples for convolution operation.
The controller has a program memory (PM), instruction decoder (ID), a subroutine module that
supports four nested subroutine calls, and a loop machine that provides zero overhead support for
four nested loops. The accelerator also has access to an on-chip DMA module that fills in the data
from external memory to DMs. All these features of the controller are standard capabilities that can
be designed and coded once and then reused in any design based on a micro-coded state machine.
11.6.2.2 Datapath/Registers
The most intensive part of the algorithm is to perform convolution with a 512-coefficient FIR filter.
The coefficients are also updated when voice activity is detected on the far-end speech signal. To
perform these operations effectively, the datapath has two MAC blocks with two multipliers and
one adder in each block. These blocks support application-specific instructions of filtering and
adaptation of coefficients. The two adders of the MAC blocks can also be independently used as
general-purpose adders or subtractors. The accelerator also supports logic instruction such as
AND, OR and EXOR of two operands. The engine has two register files to store the coefficients
and input data. Figure 11.8 shows the complete datapath of the accelerator.
Search WWH ::




Custom Search