In Depth Tutorials and Information

Processing Voice Packets with Codecs and DSPs (Considering VoIP Design Elements) Part 1

Because WAN bandwidth is probably the most expensive component of an enterprise network, network administrators must know how to calculate the total bandwidth required for voice traffic and how to reduce overall bandwidth consumption. This section describes in detail codecs, DSPs, codec complexity, and the bandwidth requirements for VoIP calls. Several variables affecting total bandwidth are explained, as well as how to calculate and reduce total bandwidth consumption.

Codecs

A codec is a device or program capable of performing encoding and decoding on a digital data stream or signal. Various types of codecs are used to encode and decode or compress and decompress data that would otherwise use large amounts of bandwidth on WAN links. Codecs are especially important on lower-speed serial links where every bit of bandwidth is needed and utilized to ensure network reliability.

One of the most important factors for a network administrator to consider while building voice networks is proper capacity planning. Network administrators must understand how much bandwidth is used for each VoIP call. To understand bandwidth, the administrator must know which codec is being utilized across the WAN link. With a thorough understanding of VoIP bandwidth and codecs, the network administrator can apply capacity planning tools.

Coding techniques are standardized by the ITU. The ITU G-series codecs are among the most popular standards for VoIP applications.

Following is a list of codecs supported by Cisco IOS gateways:

■ G.711: The international standard for encoding telephone audio on a 64 kbps channel. It is a PCM scheme operating at an 8 kHz sample rate, with 8 bits per sample. With G.711, the encoded voice is already in the correct format for digital voice delivery in the PSTN or through PBXs. It is widely used in the telecommunications field because it improves the signal-to-noise ratio without increasing the amount of data.

There are two subsets of the G.711 codec:

■ mu-law: mu-law is used in North American and Japanese phone networks.

■ a-law: a-law is used in Europe and elsewhere around the world.

Both mu-law and a-law subsets use digitized speech carried in 8-bit samples. They use an 8 kHz sampling rate with 64 kbps of bandwidth demand.

■ G.726: An ITU-T Adaptive Differential Pulse Code Modulation (ADPCM) coding at 40, 32, 24, and 16 kbps. ADPCM-encoded voice can be interchanged between packet voice, PSTN, and PBX networks if the PBX networks are configured to support ADPCM. The four bit rates associated with G.726 are often referred to by the bit size of a sample, which are 2-bits, 3-bits, 4-bits, and 5-bits, respectively.

■ G.728: Describes a 16 kbps Low-Delay Code Excited Linear Prediction (LDCELP) variation of CELP voice compression. CELP voice coding must be translated into a public telephony format for delivery to or through the PSTN.

■ G.729: Uses Conjugate Structure Algebraic Code Excited Linear Prediction (CS-ACELP) compression to code voice into 8 kbps streams. G.729a (that is, G.729 Annex A) requires less computation, but the lower complexity is not without a tradeoff because speech quality is marginally worsened. Also, G.729b (that is, G.729 Annex B) adds support for VAD and CNG, to cause G.729 to be more efficient in its bandwidth usage. The features of G.729a and G.729b can be combined into G.729ab. Standard G.729 operates at 8 kbps, but there are extensions that provide 6.4 kbps (Annex D) and 11.8 kbps (Annex E) rates for marginally worse and better speech quality, respectively.

■ G.723: Describes a dual-rate speech coder for multimedia communications. This compression technique can be used for compressing speech or audio signal components at a very low bit rate as part of the H.324 family of standards. This codec has two bit rates associated with it:

■ r63: 6.3 kbps; using 24-byte frames and the MPC-MLQ (Multipulse LPC with Maximum Likelihood Quantization) algorithm

■ r53: 5.3 kbps; using 20-byte frames and the ACELP algorithm

The higher bit rate is based on ML-MLQ technology and provides a somewhat higher quality of sound. The lower bit rate is based on CELP and provides system designers with additional flexibility.

■ GSM Full Rate Codec (GSMFR): Introduced in 1987, the GSMFR speech coder has a frame size of 20 ms and operates at a bit rate of 13 kbps. GSMFR is a RPE-LTP (Regular Pulse Excited—Linear Predictive) coder. To write VoiceXML scripts that can function as the user interface for a simple voice-mail system, the network must support GSMFR codecs. The network messaging must be capable of recording a voice message and depositing the message to an external server for later retrieval. This codec supports the Cisco infrastructure and application partner components required for service providers to deploy unified messaging applications.

■ Internet Low Bit Rate Codec (iLBC): Designed for narrow band speech, it results in a payload bit rate of 13.33 kbps for 30-ms frames and 15.20 kbps for 20-ms frames. The algorithm is a version of Block-Independent Linear Predictive Coding, with the choice of data frame lengths of 20 and 30 milliseconds. The encoded blocks have to be encapsulated in a suitable protocol for transport, such as RTP. This codec enables graceful speech quality degradation in the case of lost frames, which occurs in connection with lost or delayed IP packets.

Note iLBC is supported on Cisco AS5350XM and Cisco AS5400XM Universal Gateways with Voice Feature Cards (VFCs) and IP-to-IP gateways with no transcoding and conferencing.

The network administrator should balance the need for voice quality against the cost of bandwidth in the network when choosing codecs. The higher the codec bandwidth, the higher the cost of each call across the network.

Impact of Voice Samples and Packet Size on Bandwidth

Voice sample size is a variable that can affect total bandwidth used. A voice sample is defined as the digital output from a codec’s DSP encapsulated into a protocol data unit (PDU). Cisco uses DSPs that output samples based on digitization of 10 ms worth of audio. Cisco voice equipment encapsulates 20 ms of audio in each PDU by default, regardless of the codec used. You can apply an optional configuration command to vary the number of samples encapsulated. When you encapsulate more samples per PDU, the total bandwidth is reduced. However, encapsulating more samples per PDU comes at the risk of larger PDUs, which can cause variable delay and severe gaps if PDUs are dropped. Table 2-5 demonstrates how the number of packets required to transmit one second of audio varies with voice sample sizes.

Table 2-5 Impact of Voice Samples

Codec	Bandwidth (bps)	Sample Size (Bytes)	Packets
G.711	64,000	240	33
G.711	64,000	160	50
G.726r32	32,000	120	33
G.726r32	32,000	80	50
G.726r24	24,000	80	25
G.726r24	24,000	60	33
G.726r16	16,000	80	25
G.726r16	16,000	40	50
G.728	16,000	80	13
G.728	16,000	40	25
G.729	8000	40	25
G.729	8000	20	50
G.723r63	6300	48	16
G.723r63	6300	24	33
G.723r53	5300	40	17
G.723r53	5300	20	33

Using a simple formula, it is possible for you to determine the number of bytes encapsulated in a PDU based on the codec bandwidth and the sample size (20 ms is the default):

If you apply G.711 numbers, the formula reveals the following:

Notice from Table 2-5 that the larger the sample size, the larger the packet, and the fewer the encapsulated samples that have to be sent (which reduces bandwidth).

Data Link Overhead

Another contributing factor to bandwidth is the Layer 2 protocol used to transport VoIP. VoIP alone carries a 40 byte IP/UDP/RTP header, assuming uncompressed RTP. Depending on the Layer 2 protocol used, the overhead could grow substantially. More bandwidth is required to transport VoIP frames with larger Layer 2 overhead. The following illustrates the Layer 2 overhead for various protocols:

■ Ethernet II: Carries 18 bytes of overhead—6 bytes for source MAC, 6 bytes for destination MAC, 2 bytes for type, and 4 bytes for cyclic redundancy check (CRC)

■ MLP: Carries 6 bytes of overhead—1 byte for flag, 1 byte for address, 2 bytes for control (or type), and 2 bytes for CRC

■ Frame Relay Forum Standard 12 (FRF.12): Carries 6 bytes of overhead—2 bytes for data-link connection identifier (DLCI) header, 2 bytes for FRF.12 header, and 2 bytes for CRC

Security and Tunneling Overhead

Certain security and tunneling encapsulations also add overhead to voice packets and should be considered when calculating bandwidth requirements. When using a virtual private network (VPN), IP Security (IPsec) will add 50 to 57 bytes of overhead, a significant amount when considering the relatively small voice-packet size. Layer 2 Tunneling Protocol/generic routing encapsulation (L2TP/GRE) adds 24 bytes. When using MLP, 6 bytes will be added to each packet. Multiprotocol Label Switching (MPLS) adds a 4-byte label to every packet. All these specialized tunneling and security protocols must be considered when planning for bandwidth demands.

For example, many companies have their employees telecommute from home. These employees often initiate a VPN connection into their enterprise for secure Internet transmission. When deploying a remote telephone at the employee’s home using a router and a PBX Off-Premises eXtension (OPX), the voice packets experience additional overhead associated with the VPN.

Calculating the Total Bandwidth for a VoIP Call

Codec choice, data-link overhead, sample size, and RTP header compression have positive and negative impacts on total bandwidth, as demonstrated in Table 2-6.

Table 2-6 Total Bandwidth Required

Codec	Codec Speed (bps)	Sample Size (Bytes)	Frame Relay (bps)	Frame Relay with cRTP (bps)	Ethernet (bps)
G.711	64,000	240	76,267	66,133	79,467
G.711	64,000	160	82,400	67,200	87,200
G.726r32	32,000	120	44,267	34,133	47,467
G.726r32	32,000	80	50,400	35,200	55,200
G.726r24	24,000	80	37,800	26,400	41,400
G.726r24	24,000	60	42,400	27,200	47,200
G.726r16	16,000	80	25,200	17,600	27,600
G.726r16	16,000	40	34,400	19,200	39,200
G.728	16,000	80	25,200	17,600	27,600
G.728	16,000	40	34,400	19,200	39,200
G.729	8000	40	17,200	9600	19,600
G.729	8000	20	26,400	11,200	31,200
G.723r63	6300	48	12,338	7350	13,913
G.723r63	6300	24	18,375	8400	21,525
G.723r53	5300	40	11,395	6360	12,985
G.723r53	5300	20	17,490	7420	20,670

To perform the calculations, you must consider these contributing factors as part of the equation:

■ More bandwidth required for the codec requires more total bandwidth.

■ More overhead associated with the data link requires more total bandwidth.

■ Larger sample size requires less total bandwidth.

■ RTP header compression requires significantly less total bandwidth.

Consider a sample total bandwidth calculation. A company is implementing VoIP to carry voice calls between all sites. WAN connections between sites will carry both data and voice. To use bandwidth efficiently and keep costs to a minimum, voice traffic traversing the WAN will be compressed using the G.729 codec with 20-byte voice samples. WAN connectivity will be through a Frame Relay provider.

The following calculation is used to calculate total bandwidth required per call:

Total_Bandwidth = ([Layer_2_Overhead + IP_UDP_RTP Overhead + Sample_Size] / Sample_Size) * codec_Speed

Calculation for the G.729 codec, 20-byte sample size, using Frame Relay without Compressed RTP (cRTP) is as follows:

Total_Bandwidth = ([6 + 40 + 20] / 20) * 8000 Total_Bandwidth = 26,400 bps

Calculation for G.729 codec, 20-byte sample size, using Frame Relay with cRTP is as follows:

Total_Bandwidth = ([6 + 2 + 20] / 20) * 8000 Total_Bandwidth = 11,200 bps

Next post: Processing Voice Packets with Codecs and DSPs (Considering VoIP Design Elements) Part 2

Previous post: VoIP Fundamentals (Considering VoIP Design Elements) Part 5