VOICE PROCESSING COMPLEXITY (VoIP)

19.5
In this section, an overview on processing requirements is given. The major computational intensive modules for VoIP voice chain are codecs (G729AB, G.723.1, and wideband G.722, G.729.1, G.722.2), echo cancellers for narrowband and wideband, dual-tone multifrequency (DTMF), and PLC. The other modules of voice activity detection (VAD)/comfort noise generation (CNG), jitter buffers, tone generation, and various tone detection take less processing. A fax chain with group-3, V.17 requires higher processing. In super group-3, V.34 requires more processing.
For voice processing, the DSPs selected will typically have multiple multiplier and accumulator (MAC) processing units. The number of parallel multipliers determines the speed of processing. The older families of DSPs such as the 218x [URL (ADI-218x)] and TI-54x [URL (TI-54x)] have a single multiplier unit. Later families, such as the TI-55X and Blackfin, have dual multiplier units. The TI-64X [URL (Encore-G729AB)] and Starcore families have complex processing units that work like more than two independent multiplier units.
Many voice processing algorithms can derive significant performance benefits from dual multiplier processor architecture—in some cases as much as 40% to 50%. For example, consider G.729AB [ITU-T-G.729A (1996), ITU-T-G.729B (1996)] processing from reference [URL (Encore-G729AB)].A single multiplier processor such as the ADSP218x or the TI-54x requires about 12.5 to 13 MCPS (MCPS is million cycles per second) of processing for G.729AB. It takes about 8 to 9 MCPS on dual MAC units such as the TI-55x and Blackfin [URL (ADI-BF536)]. The processor that work like more than two multiplier-
equivalent processing such as MSC8101 [URL (Freescale)] consumes 5.0 MCPS [URL (Encore-G729AB)] for the same G.729AB processing.
The ARM ARM-9E processor is a typical RISC engine with DSP extensions. For the G.729AB codec, this takes 35 MCPS [URL (Encore-G729AB)]— more than even the simple single multiplier DSPs. However, the simpler instruction set used by RISC processors allows higher clock-speed implementations, so the time taken to execute the algorithm may be comparable with a simple DSP. Additional support operations such as jitter buffer, Real- Time Transport Protocal (RTP), other packetization, VoIP signaling, and minimal network functions are executed on a packet basis. These modules require typically 5 to 10MCPS per voice channel on this type of RISC processor. In general, these functions do not derive any benefit from the DSP extensions of the processor.
NarrowBand Voice and Fax Processing. On a single MAC-based processor like TI-54x, the processing for narrowband codecs such as G.729AB and all other modules can be managed with approximately 30 to 40 MCPS for the whole voice chain processing that includes echo cancellation, DTMF, and so on, assuming the host processor is used for packetization and networking. Fax processing complexity is similar to narrowband codec-based voice chain processing. Several voice chain modules are disabled during fax. The processing performance required by narrowband voice chain processing would also be sufficient for fax [URL (Encore-T38)] over IP processing.
WideBand Voice Processing. On a single MAC-based processor such as the TI-54x, wideband codecs including G.729.1 [ITU-T-G.729.1 (2006)] and
G.722.2 [ITU - T- G.722.2 (2003) ] require approximately 60 to 65 MCPS per
channel, assuming host processor is used for processing of packetization and networking. A wideband codec such as G.722 [ITU-T-G.722 (1988)] requires less processing and can typically be accommodated within the same processing power as the narrowband processing chain from processing. In wideband mode, the echo canceller taps will be doubled because of a sampling frequency increase from 8 to 16kHz, and this consumes more processing than the narrowband operation. Wideband voice processing also occupies more memory. In general, wideband implementations have to use higher speed MHz processors with more memory.
19.5.1


DSP Arithmetic for Voice Processing

Signal processing operations for voice and fax make use of mainly MAC operations. Several basic operations are given in the C-code of ITU-T codecs
(e.g., G.729 codec [ITU-T- G.729 (1996)] ). The ability to process these basic
operations in minimum (preferably single) cycles makes the voice processing more efficient. A summary of some important operations that help voice processing are listed here:
• Multiplications accumulation, with built-in rounding and saturation
• Fractional format support with a sign usually referred to as q15 or 1.15 formats with a sign and 15-bit fractional part. The results of multiplication are left shifted by 1 bit, creating q31 or 1.31 formats.
• Ability to handle maximum negative numbers to make it positive (e.g., multiplication of two 16-bit negative numbers of 0×8000 (16-bit hexadecimal format number for -1) should result in a positive number). Basic multipliers in a general- purpose processor may not handle this type of operation.
• Arithmetic, mainly addition and subtraction, has to cater to rounding and saturation
• Arithmetic shifts with saturation operations
• Exponential and normalization operations with saturation operations
• Efficient division routines support
Several other important features are available such as parallel data access, addressing modes, circular buffering, bit reversal, and absolute operations. Several network processors upgraded in architecture for DSP extensions are taking care of a major part of the above-listed instructions. Many DSP operations that reflect useful instructions can be found in processor data sheets [URL (ADI-BF536), URL (TI-54x)]. Specific to VoIP, it is important to select a processor that supports voice processing with suitable instructions to reduce the amount of processing cycles.

Next post:

Previous post: