Processing Voice Packets with Codecs and DSPs (Considering VoIP Design Elements) Part 1

Because WAN bandwidth is probably the most expensive component of an enterprise network, network administrators must know how to calculate the total bandwidth required for voice traffic and how to reduce overall bandwidth consumption. This section describes in detail codecs, DSPs, codec complexity, and the bandwidth requirements for VoIP calls. Several variables affecting total bandwidth are explained, as well as how to calculate and reduce total bandwidth consumption.

Codecs

A codec is a device or program capable of performing encoding and decoding on a digital data stream or signal. Various types of codecs are used to encode and decode or compress and decompress data that would otherwise use large amounts of bandwidth on WAN links. Codecs are especially important on lower-speed serial links where every bit of bandwidth is needed and utilized to ensure network reliability.

One of the most important factors for a network administrator to consider while building voice networks is proper capacity planning. Network administrators must understand how much bandwidth is used for each VoIP call. To understand bandwidth, the administrator must know which codec is being utilized across the WAN link. With a thorough understanding of VoIP bandwidth and codecs, the network administrator can apply capacity planning tools.

Coding techniques are standardized by the ITU. The ITU G-series codecs are among the most popular standards for VoIP applications.


Following is a list of codecs supported by Cisco IOS gateways:

■ G.711: The international standard for encoding telephone audio on a 64 kbps channel. It is a PCM scheme operating at an 8 kHz sample rate, with 8 bits per sample. With G.711, the encoded voice is already in the correct format for digital voice delivery in the PSTN or through PBXs. It is widely used in the telecommunications field because it improves the signal-to-noise ratio without increasing the amount of data.

There are two subsets of the G.711 codec:

■ mu-law: mu-law is used in North American and Japanese phone networks.

■ a-law: a-law is used in Europe and elsewhere around the world.

Both mu-law and a-law subsets use digitized speech carried in 8-bit samples. They use an 8 kHz sampling rate with 64 kbps of bandwidth demand.

■ G.726: An ITU-T Adaptive Differential Pulse Code Modulation (ADPCM) coding at 40, 32, 24, and 16 kbps. ADPCM-encoded voice can be interchanged between packet voice, PSTN, and PBX networks if the PBX networks are configured to support ADPCM. The four bit rates associated with G.726 are often referred to by the bit size of a sample, which are 2-bits, 3-bits, 4-bits, and 5-bits, respectively.

■ G.728: Describes a 16 kbps Low-Delay Code Excited Linear Prediction (LDCELP) variation of CELP voice compression. CELP voice coding must be translated into a public telephony format for delivery to or through the PSTN.

■ G.729: Uses Conjugate Structure Algebraic Code Excited Linear Prediction (CS-ACELP) compression to code voice into 8 kbps streams. G.729a (that is, G.729 Annex A) requires less computation, but the lower complexity is not without a tradeoff because speech quality is marginally worsened. Also, G.729b (that is, G.729 Annex B) adds support for VAD and CNG, to cause G.729 to be more efficient in its bandwidth usage. The features of G.729a and G.729b can be combined into G.729ab. Standard G.729 operates at 8 kbps, but there are extensions that provide 6.4 kbps (Annex D) and 11.8 kbps (Annex E) rates for marginally worse and better speech quality, respectively.

■ G.723: Describes a dual-rate speech coder for multimedia communications. This compression technique can be used for compressing speech or audio signal components at a very low bit rate as part of the H.324 family of standards. This codec has two bit rates associated with it:

■ r63: 6.3 kbps; using 24-byte frames and the MPC-MLQ (Multipulse LPC with Maximum Likelihood Quantization) algorithm

■ r53: 5.3 kbps; using 20-byte frames and the ACELP algorithm

The higher bit rate is based on ML-MLQ technology and provides a somewhat higher quality of sound. The lower bit rate is based on CELP and provides system designers with additional flexibility.

■ GSM Full Rate Codec (GSMFR): Introduced in 1987, the GSMFR speech coder has a frame size of 20 ms and operates at a bit rate of 13 kbps. GSMFR is a RPE-LTP (Regular Pulse Excited—Linear Predictive) coder. To write VoiceXML scripts that can function as the user interface for a simple voice-mail system, the network must support GSMFR codecs. The network messaging must be capable of recording a voice message and depositing the message to an external server for later retrieval. This codec supports the Cisco infrastructure and application partner components required for service providers to deploy unified messaging applications.

■ Internet Low Bit Rate Codec (iLBC): Designed for narrow band speech, it results in a payload bit rate of 13.33 kbps for 30-ms frames and 15.20 kbps for 20-ms frames. The algorithm is a version of Block-Independent Linear Predictive Coding, with the choice of data frame lengths of 20 and 30 milliseconds. The encoded blocks have to be encapsulated in a suitable protocol for transport, such as RTP. This codec enables graceful speech quality degradation in the case of lost frames, which occurs in connection with lost or delayed IP packets.

Note iLBC is supported on Cisco AS5350XM and Cisco AS5400XM Universal Gateways with Voice Feature Cards (VFCs) and IP-to-IP gateways with no transcoding and conferencing.

The network administrator should balance the need for voice quality against the cost of bandwidth in the network when choosing codecs. The higher the codec bandwidth, the higher the cost of each call across the network.

Impact of Voice Samples and Packet Size on Bandwidth

Voice sample size is a variable that can affect total bandwidth used. A voice sample is defined as the digital output from a codec’s DSP encapsulated into a protocol data unit (PDU). Cisco uses DSPs that output samples based on digitization of 10 ms worth of audio. Cisco voice equipment encapsulates 20 ms of audio in each PDU by default, regardless of the codec used. You can apply an optional configuration command to vary the number of samples encapsulated. When you encapsulate more samples per PDU, the total bandwidth is reduced. However, encapsulating more samples per PDU comes at the risk of larger PDUs, which can cause variable delay and severe gaps if PDUs are dropped. Table 2-5 demonstrates how the number of packets required to transmit one second of audio varies with voice sample sizes.

Table 2-5 Impact of Voice Samples

Codec

Bandwidth (bps)

Sample Size (Bytes)

Packets

G.711

64,000

240

33

G.711

64,000

160

50

G.726r32

32,000

120

33

G.726r32

32,000

80

50

G.726r24

24,000

80

25

G.726r24

24,000

60

33

G.726r16

16,000

80

25

G.726r16

16,000

40

50

G.728

16,000

80

13

G.728

16,000

40

25

G.729

8000

40

25

G.729

8000

20

50

G.723r63

6300

48

16

G.723r63

6300

24

33

G.723r53

5300

40

17

G.723r53

5300

20

33

Using a simple formula, it is possible for you to determine the number of bytes encapsulated in a PDU based on the codec bandwidth and the sample size (20 ms is the default):

tmp9-42_thumb

If you apply G.711 numbers, the formula reveals the following:

tmp9-43_thumb

Notice from Table 2-5 that the larger the sample size, the larger the packet, and the fewer the encapsulated samples that have to be sent (which reduces bandwidth).

Data Link Overhead

Another contributing factor to bandwidth is the Layer 2 protocol used to transport VoIP. VoIP alone carries a 40 byte IP/UDP/RTP header, assuming uncompressed RTP. Depending on the Layer 2 protocol used, the overhead could grow substantially. More bandwidth is required to transport VoIP frames with larger Layer 2 overhead. The following illustrates the Layer 2 overhead for various protocols:

■ Ethernet II: Carries 18 bytes of overhead—6 bytes for source MAC, 6 bytes for destination MAC, 2 bytes for type, and 4 bytes for cyclic redundancy check (CRC)

■ MLP: Carries 6 bytes of overhead—1 byte for flag, 1 byte for address, 2 bytes for control (or type), and 2 bytes for CRC

■ Frame Relay Forum Standard 12 (FRF.12): Carries 6 bytes of overhead—2 bytes for data-link connection identifier (DLCI) header, 2 bytes for FRF.12 header, and 2 bytes for CRC

Security and Tunneling Overhead

Certain security and tunneling encapsulations also add overhead to voice packets and should be considered when calculating bandwidth requirements. When using a virtual private network (VPN), IP Security (IPsec) will add 50 to 57 bytes of overhead, a significant amount when considering the relatively small voice-packet size. Layer 2 Tunneling Protocol/generic routing encapsulation (L2TP/GRE) adds 24 bytes. When using MLP, 6 bytes will be added to each packet. Multiprotocol Label Switching (MPLS) adds a 4-byte label to every packet. All these specialized tunneling and security protocols must be considered when planning for bandwidth demands.

For example, many companies have their employees telecommute from home. These employees often initiate a VPN connection into their enterprise for secure Internet transmission. When deploying a remote telephone at the employee’s home using a router and a PBX Off-Premises eXtension (OPX), the voice packets experience additional overhead associated with the VPN.

Calculating the Total Bandwidth for a VoIP Call

Codec choice, data-link overhead, sample size, and RTP header compression have positive and negative impacts on total bandwidth, as demonstrated in Table 2-6.

Table 2-6 Total Bandwidth Required

Codec

Codec Speed (bps)

Sample Size (Bytes)

Frame Relay (bps)

Frame Relay with cRTP (bps)

Ethernet (bps)

G.711

64,000

240

76,267

66,133

79,467

G.711

64,000

160

82,400

67,200

87,200

G.726r32

32,000

120

44,267

34,133

47,467

G.726r32

32,000

80

50,400

35,200

55,200

G.726r24

24,000

80

37,800

26,400

41,400

G.726r24

24,000

60

42,400

27,200

47,200

G.726r16

16,000

80

25,200

17,600

27,600

G.726r16

16,000

40

34,400

19,200

39,200

G.728

16,000

80

25,200

17,600

27,600

G.728

16,000

40

34,400

19,200

39,200

G.729

8000

40

17,200

9600

19,600

G.729

8000

20

26,400

11,200

31,200

G.723r63

6300

48

12,338

7350

13,913

G.723r63

6300

24

18,375

8400

21,525

G.723r53

5300

40

11,395

6360

12,985

G.723r53

5300

20

17,490

7420

20,670

To perform the calculations, you must consider these contributing factors as part of the equation:

■ More bandwidth required for the codec requires more total bandwidth.

■ More overhead associated with the data link requires more total bandwidth.

■ Larger sample size requires less total bandwidth.

■ RTP header compression requires significantly less total bandwidth.

Consider a sample total bandwidth calculation. A company is implementing VoIP to carry voice calls between all sites. WAN connections between sites will carry both data and voice. To use bandwidth efficiently and keep costs to a minimum, voice traffic traversing the WAN will be compressed using the G.729 codec with 20-byte voice samples. WAN connectivity will be through a Frame Relay provider.

The following calculation is used to calculate total bandwidth required per call:

Total_Bandwidth = ([Layer_2_Overhead + IP_UDP_RTP Overhead + Sample_Size] / Sample_Size) * codec_Speed

Calculation for the G.729 codec, 20-byte sample size, using Frame Relay without Compressed RTP (cRTP) is as follows:

Total_Bandwidth = ([6 + 40 + 20] / 20) * 8000 Total_Bandwidth = 26,400 bps

Calculation for G.729 codec, 20-byte sample size, using Frame Relay with cRTP is as follows:

Total_Bandwidth = ([6 + 2 + 20] / 20) * 8000 Total_Bandwidth = 11,200 bps

Next post:

Previous post: