Processing Voice Packets with Codecs and DSPs (Considering VoIP Design Elements) Part 2

Effects of Voice Activity Detection on Bandwidth

Statistically, an aggregate of 24 calls or more might contain 35 percent silence. With traditional telephony voice networks, all G.711 voice calls use 64 kbps fixed-bandwidth links regardless of how much of the conversation is speech and how much is silence. In Cisco VoIP networks, all conversations and silences are packetized. VAD can suppress packets containing silence. Instead of sending VoIP packets of silence, VoIP gateways interleave data traffic with VoIP conversations to more effectively use network bandwidth. Table 2-7 illustrates the type of bandwidth savings VAD offers.

Table 2-7 Impact of VAD on Required Bandwidth

Codec	Codec Speed (bps)	Sample Size (Bytes)	Frame Relay (bps)	Frame Relay with VAD (bps)
G.711	64,000	240	76,267	49,573
G.711	64,000	160	82,400	53,560
G.726r32	32,000	120	44,267	28,773
G.726r32	32,000	80	50,400	32,760
G.726r24	24,000	80	37,800	24,570
G.726r24	24,000	60	42,400	27,560
G.726r16	16,000	80	25,200	16,380
G.726r16	16,000	40	34,400	22,360
G.728	16,000	80	25,200	16,380
G.728	16,000	40	34,400	22,360
G.729	8000	40	17,200	11,180
G.729	8000	20	26,400	17,160

Table 2-7 Impact of VAD on Required Bandwidth

Codec	Codec Speed (bps)	Sample Size (Bytes)	Frame Relay (bps)	Frame Relay with VAD (bps)
G.723r63	6300	48	12,338	8019
G.723r63	6300	24	18,375	11,944
G.723r53	5300	40	11,395	7407
G.723r53	5300	20	17,490	11,369

Note Bandwidth savings of 35 percent is an average figure and does not take into account loud background sounds, differences in languages, and other factors.

Note For the purposes of network design and bandwidth engineering, VAD should not be taken into account, especially on links that carry fewer than 24 voice calls simultaneously.

Various features, such as music on hold (MOH) and fax, render VAD ineffective. When the network is engineered for the full voice-call bandwidth, all savings provided by VAD are available to data applications.

VAD is enabled by default for all VoIP calls. Not only does VAD reduce the silence in VoIP conversations, but it also provides CNG. In some cases, silence might be mistaken for a disconnected call. CNG provides locally generated white noise to make the call appear normally connected to both parties.

For example, a company is assessing the effect of VAD in a Frame Relay VoIP environment. The company plans to use G.729 for all voice calls crossing the WAN. Previously, it was determined that each voice call compressed with G.729 uses 26,400 bps. VAD can reduce the bandwidth utilization to approximately 17,160 bps, which constitutes a bandwidth savings of 35 percent.

DSP

DSP is a specialized microprocessor designed specifically for digital signal processing. DSPs enable Cisco platforms to efficiently process digital voice traffic. DSPs on a router provide stream-to-packet signal processing functionality that includes voice compression, echo cancellation, and tone- and voice-activity detection.

A media resource is a software-based or hardware-based entity that performs media-processing functions on the data streams to which it is connected. A few examples are media-processing functions that include mixing multiple streams to create one output stream (conferencing), passing the stream from one connection to another (media termination point), converting the data stream from one compression type to another (transcoding), echo cancellation, signaling, termination of a voice stream from a TDM circuit (coding/decoding), packetization of a stream, and streaming audio (annunciation).

The terms "DSP" and "media resource" are often used interchangeably in some documentation.

The four major functions of DSPs in a voice gateway are as follows:

■ Transcoding: Transcoding is the direct digital-to-digital conversion from one codec to another. Transcoding compresses and decompresses voice streams to match end-point-device capabilities. Transcoding is required when an incoming voice stream is digitized and compressed (by means of a codec) to save bandwidth, but the local device does not support that type of compression. Ideally, all IP telephony devices would support the same codecs, but this is not the case. Rather, different devices support different codecs.

Transcoding is processed by DSPs on the DSP farm. Sessions are initiated and managed by Cisco Unified Communications Manager. Cisco Unified Communications Manager also refers to transcoders as hardware MTPs.

If an application or service can handle only one specific codec type, which is usually G.711, a G.729 call from a remote site must be transcoded to G.711. This can be done only via DSP resources. Because applications and services are often hosted in main sites, DSP transcoding resources are most common in central sites.

■ Voice termination: Voice termination applies to a call that has two call legs, one leg on a TDM interface and the second leg on a VoIP connection. The TDM leg must be terminated by hardware that performs coding/decoding and packetization of the stream. DSPs perform this termination function. The DSP also provides echo cancellation, voice activity detection, and jitter management at the same time it performs voice termination.

■ Media Termination Point (MTP): An MTP is an entity that accepts two full-duplex voice streams using the same codec. It bridges the media streams and allows them to be set up and torn down independently. The streaming data received from the input stream on one connection is passed to the output stream on the other connection, and vice versa. In addition, the MTP can be used to transcode a-law to mu-law and vice versa, or it can be used to bridge two connections that utilize different packeti-zation periods. MTPs are also used to provide further processing of a call, such as RFC 2833 support.

■ Audio Conferencing: In a traditional circuit-switched voice network, all voice traffic goes through a central device (such as a PBX system), which provides audio conferencing services as well. Because IP phones transmit voice traffic directly between phones, a network-based conference bridge is required to facilitate multiparty conferences.

A conference bridge is a resource that joins multiple participants into a single call. It can accept any number of connections for a given conference, up to the maximum number of streams allowed for a single conference on that device. A one-to-one correspondence exists between media streams connected to a conference and participants connected to the conference. The conference bridge mixes the streams together and creates a unique output stream for each connected party. The output stream for a given party is the composite of the streams from all connected parties minus their own input stream. Some conference bridges mix only the three loudest talkers on the conference and distribute that composite stream to each participant (minus their own input stream if they are one of the talkers).

Hardware conference bridges are used in two environments. They can be used to increase the conferencing capacity in a central site without putting an additional load on Cisco Unified Communications Manager servers, which can host software-based conference bridges. More important is the use of hardware conference bridges in remote sites. If no remote-site conference resources are deployed, every conference will be routed to central resources, resulting in sometimes-excessive WAN usage.

In addition, DSP-based conference bridges can mix G.711 and G.729 calls, thus supporting any call-type scenario in multisite environments. In contrast, software-based conference bridges deployed on Cisco Unified Communications Manager servers can mix only G.711 calls.

Other possible uses for MTPs include the following:

■ Repacketization: An MTP can be used to transcode a-law to mu-law and vice versa, or it can be used to bridge two connections that utilize different packetization periods.

■ H.323 Supplementary Services: MTPs can be used to extend supplementary services to H.323 endpoints that do not support the H.323v2 OpenLogicalChannel and CloseLogicalChannel request features of the Empty Capabilities Set (ECS). This requirement occurs infrequently. Cisco H.323 endpoints support ECS, and most third-party endpoints have support as well. When needed, an MTP is allocated and connected into a call on behalf of an H.323 endpoint. After insertion, the media streams are connected between the MTP and the H.323 device, and these connections are present for the duration of the call. The media streams connected to the other side of the MTP can be connected and disconnected as needed to implement features such as hold and transfer.

When an MTP is required on an H.323 call and none is available, the call will proceed but will not be able to invoke supplementary services.

Note Implementations prior to Cisco Unified Communications Manager Release 3.2 required MTPs to provide supplementary services for H.323 endpoints, but Cisco Unified Communications Manager Release 3.2 and later no longer require MTP resources to provide this functionality.

MTP Types

Two types of MTPs are supported on Cisco VoIP equipment (for example, Cisco IOS routers and Cisco Unified Communications servers): software MTPs and hardware MTPs.

Software MTP

A software MTP is a resource that can be implemented by installing the Cisco IP Voice Media Streaming Application on a Cisco Unified Communications Manager server or by using a Cisco IOS gateway without using DSP resources. A software MTP device supports G.711 to G.711 and G.729 to G.729 streams. A Cisco IOS-enhanced software device can be implemented on a Cisco IOS router by configuring a software-only MTP under a DSP farm. This DSP farm can be used only as a pure MTP and does not require any hardware DSPs on the router. Examples are as follows:

■ Cisco IP Voice Media Streaming Application: This software MTP is a device that is implemented by installing the Cisco IP Voice Media Streaming Application on a server. When the installed application is configured as an MTP application, it registers with a Cisco Unified Communications Manager node and informs Cisco Unified Communications Manager of how many MTP resources it supports. The IP Voice Media Streaming Application is a resource that might also be used for several functions, and proper design must consider all functions together.

■ Cisco IOS based: This MTP allows configuration of any of the following codecs, but only one might be configured at a given time: G.711 mu-law and a-law, G.729a, G.729, G.729ab, G.729b, GSM, and pass-through. However, some of these are not pertinent to a Cisco Unified Communications Manager implementation.

The router configuration permits up to 500 individual streams, which support 250 transcoded sessions. This number of G.711 streams generates 5 Mbps of traffic.

Hardware MTP

A hardware MTP is a resource that uses gateway-based DSPs to interconnect two G.711 streams. This is done without using the gateway CPU. This hardware-only implementation uses a DSP resource for endpoints using the same G.711 codec but a different packetiza-tion time. The repacketization requires a DSP resource, so it cannot be done by software only. Examples are as follows:

■ Cisco NM-HDV2, NM-HD-1V/2V/2VE, 2800 and 3800 Series Routers

■ This hardware uses the PVDM-2 modules for providing DSPs.

■ Each DSP can provide 16 G.711 mu-law or a-law MTP sessions or 6 G.729, G.729b, or GSM MTP sessions.

■ Cisco WS-SVC-CMM-ACT

■ This module has four DSPs that can be configured individually.

■ Each DSP can support 128 G.729, G.729b, or GSM MTP sessions or 256 G.711 mu-law or a-law MTP sessions.

■ Catalyst WS-X6608-T1 and WS-X6608-E1

■ Codec support is G.711 mu-law or a-law, G.729, G.720b, or GSM.

■ Configuration is done at the port level. Eight ports are available per module.

■ Each port configured as an MTP resource provides 24 sessions.

Hardware Conferencing and Transcoding Resources

Figure 2-14 shows a multisite environment with deployed DSP resources. Router2 in Chicago is offering DSP-based conferencing services to support mixed codec environments and optimal WAN usage.

Figure 2-14 Media Resource Deployment Example

The central gateway, Router1, offers transcoding and conferencing services. The transcoding resources can be used to transcode G.729 to G.711 and then connect to an application server or even a software-based Cisco Unified Communications Manager conference bridge.

Codec Complexity

Codec complexity refers to the amount of processing required to perform voice compression. Codec complexity affects call density (that is, the number of calls reconciled on the DSPs). With higher codec complexity, fewer calls can be handled. Select a higher codec complexity when that is required to support a particular codec or combination of codecs. Select a lower codec complexity to support the greatest number of voice channels, provided the lower complexity is compatible with the particular codecs in use. Cisco DSP resources use one of two types of chipsets, the older C549 DSPs and the newer C5510 DSPs. Table 2-8 illustrates the complexity modes the C549 chipset needs to run to support a variety of codecs.

Table 2-8 C549 Codec Complexity

Medium Complexity (4 calls/DSP)	High Complexity (2 calls/DSP)
G.711 (a-law and mu-law)	G.728
G.726 (all versions)	G.723 (all versions)
G.729a, G.729ab (G.729a Annex B)	G.729, G.729b (G.729-Annex B)
Fax relay	Fax relay

Some codec compression techniques require more processing power than others. For example,

■ Medium complexity allows the C549 DSPs to process up to four voice/fax relay calls per DSP and the C5510 DSPs to process up to eight voice/fax relay calls per DSP.

■ High complexity allows the C549 DSPs to process up to two voice/fax relay calls per DSP and the C5510 DSPs to process up to six voice/fax relay calls per DSP.

The difference between medium and high complexity codecs is the amount of CPU utilization necessary to process the codec algorithm, and therefore, the number of voice channels that can be supported by a single DSP. For this reason, all the medium complexity codecs can also be run in high complexity mode, but fewer (usually about half) of the channels are available per DSP.