Bandwidth Calculation (Cisco VoIP Implementations)

Computing the exact amount of bandwidth needed for each VoIP call is necessary for planning and provisioning sufficient bandwidth in LANs and WANs. The previous section referenced parts of this computation, but this section thoroughly covers the subject of VoIP bandwidth calculation. The impact of packet size, Layer 2 overhead, tunneling, security, and voice activity detection are considered in this discussion.

Impact of Voice Samples and Packet Size on Bandwidth

DSP coverts analog voice signal to digital voice signal using a particular codec. Based on the codec used, the DSP generates so many bits per second. The bits that are generated for 10 milliseconds (ms) of analog voice signal form one digital voice sample. The size of the digital voice sample depends on the codec used. Table 1-6 shows how the digital voice sample size changes based on the codec used. The number of voice bytes for two digital voice samples using different codecs is shown in the last column.

Table 1-6 Examples of Voice Payload Size Using Different Codecs

Codec: Bandwidth

Size of Digital Voice Sample for 10 ms of Analog Voice in Bits

Size of 10 ms Digitized Voice in Bytes

Size of Two Digital Voice Samples (20 ms)

G.711: 64 Kbps

64,000 bps X 10/1000 sec = 640 bits

80 bytes

2 X 80 = 160 bytes

G.726 r32: 32 Kbps

32,000 bps X 10/1000 sec = 320 bits

40 bytes

2 X 40 = 80 bytes

G.726 r24: 24 Kbps

24,000 bps X 10/1000 sec = 240 bits

30 bytes

2 X 30 = 60 bytes

G.726 r16: 16 Kbps

16,000 bps X 10/1000 sec = 160 bits

20 bytes

2 X 20 = 40 bytes

G.728: 16 Kbps

16,000 bps X 10/1000 sec = 160 bits

20 bytes

2 X 20 = 40 bytes

G.729: 8 Kbps

8000 bps X 10/1000 sec = 80 bits

10 bytes

2 X 10 = 20 bytes

The total size of a Layer 2 frame encapsulating a VoIP packet depends on the following factors:

■ Packet rate and packetization size—Packet rate, specified in packets per seconds (pps), is inversely proportional to packetization size, which is the amount of voice that is digitized and encapsulated in each IP packet. Packetization size is expressed in bytes and depends on the codec used and the amount of voice that is digitized. For example, if two 10-ms digitized voice samples (total of 20 ms voice) are encapsulated in each IP packet, the packet rate will be 1 over 0.020, or 50 packets per second (pps), and if G.711 is used, the packetization size will be 160 bytes. (See Table 1-6.)

■ IP overhead—IP overhead refers to the total number of bytes in the RTP, UDP, and IP headers. With no RTP header compression, the IP overhead is 40 bytes. If cRTP with no header checksum is applied to a link, the IP overhead drops to 2 bytes, and with header checksum, the IP header checksum is 4 bytes.

■ Data link overhead—Data link layer overhead is always present, but its size depends on the type of encapsulation (frame type) and whether link compression applied. For instance, the data link layer overhead of Ethernet is 18 bytes (it is 22 bytes with 802.1Q).

■ Tunneling overhead—Tunneling overhead is only present if some type of tunneling is used. Generic routing encapsulation (GRE), Layer 2 Tunneling Protocol (L2TP), IP security (IPsec), QinQ (802.1Q), and Multiprotocol Label Switching (MPLS) are common tunneling techniques with their own usage reasons and benefits. Each tunneling approach adds a specific number of overhead bytes to the frame.

Codecs are of various types. The size of each VoIP packet depends on the codec type used and the number of voice samples encapsulated in each IP packet. The number of bits per second that each codec generates is referred to as codec bandwidth. The following is a list of some ITU codec standards, along with a brief description for each:

■ G.711 is PCM—Based on the 8000 samples per second rate and 8 bits per sample, PCM generates 64,000 bits per second, or 64 Kbps. No compression is performed.

■ G.726 is adaptive differential pulse code modulation (ADPCM)—Instead of constantly sending 8 bits per sample, fewer bits per sample, which only describe the change from the previous sample, are sent. If the number of bits (that describe the change) sent is 4, 3, or 2, G.726 generates 32 Kbps, 24 Kbps, or 16 Kbps respectively, and it is correspondingly called G.726 r32, G.726 r24, or G.726 r16.

■ G.722 is wideband speech encoding standard—G.722 divides the input signal into two subbands and encodes each subband using a modified version of ADPCM. G.722 supports a bit rate of 64 Kbps, 56 Kbps, or 48 Kbps.

■ G.728 is low delay code exited linear prediction (LDCELP)—G.728 uses codes that describe voice samples generated by human vocal cords, and it utilizes a prediction technique. Wave shapes of five samples (equivalent of 40 bits in PCM) are expressed with 10-bit codes; therefore, the G.728 bandwidth drops to 16 Kbps.

■ G.729 is conjugate structure algebraic code exited linear prediction (CS-ACELP)—DSPs produce one digital voice sample for 10 milliseconds (ms) of analog voice signal. It is common among Cisco voice-enabled devices to put two digital voice samples in one IP packet, but it is possible to put three or four samples in one IP packet if desired. The packetization period is the amount of analog voice signal (expressed in milliseconds) that is encapsulated in each IP packet (in digitized format). The merit of more voice samples in a packet—longer packetization period, in other words—is reduction in the overhead-to-payload ratio.

The problem, though, with putting too many digital voice samples in one IP packet is that when a packet is dropped, too much voice is lost. That loss has a more noticeable negative effect on the quality of the call when packets are dropped. The other drawback of a longer packetization period (more than two or three digital voice samples in one IP packet) is the extra packetization delay it introduces. More voice bits means a larger IP packet, and a larger IP packet means a longer packetization period.

Table 1-7 shows a few examples to demonstrate the combined effect of codec used and packetization period (number of digitized 10-ms voice samples per packet) on the voice encapsulating IP packet (VoIP) size and on the packet rate. The examples in Table 1-7 do not use compressed RTP and make no reference to the effects of Layer 2 and tunneling overheads.

Table 1-7 Packet Size and Packet Rate Variation Examples

Codec and Packetization Period (Number of Encapsulated Digital Voice Samples)

Codec Bandwidth

Voice Payload





Total IP (VoIP)

Packet Size




G.711 with 20-ms packetization period (two 10-ms samples)

64 Kbps

160 bytes

40 bytes

200 bytes

50 pps

G.711 with 30-ms packetization period (three 10-ms samples)

64 Kbps

240 bytes

40 bytes

280 bytes

33.33 pps

G.729 with 20 ms packetization period (two 10-ms samples)

8 Kbps

20 bytes

40 bytes

60 bytes

50 pps

G.729 with 40 ms packetization period (four 10-ms samples)

8 Kbps

40 bytes

40 bytes

80 bytes

25 pps

Data Link Overhead

Transmitting an IP packet over a link requires encapsulation of the IP packet in a frame that is appropriate for the data link layer protocol provisioned on that link. For instance, if the data link layer protocol used on a link is PPP, the interface connected to that link must be configured for PPP encapsulation. In other words, any packet to be transmitted out of that interface must be encapsulated in a PPP frame. When a router routes a packet, the packet can enter the router via an interface with a certain encapsulation type such as Ethernet, and it can leave the router through another interface with a different encapsulation such as PPP. After the Ethernet frame enters the router via the ingress interface, the IP packet is de-encapsulated. Next, the routing decision directs the packet to the egress interface. The packet has to be encapsulated in the frame proper for the egress interface data link protocol before it is transmitted.

Different data link layer protocols have a different number of bytes on the frame header; for VoIP purposes, these are referred to as data link overhead bytes. Data link overhead bytes for Ethernet, Frame Relay, Multilink PPP (MLP), and Dot1Q (802.1Q) are 18, 6, 6, and 22 bytes in that order, to name a few. During calculation of the total bandwidth required for a VoIP call, for each link type (data link layer protocol or encapsulation), you must consider the appropriate data link layer overhead.

Security and Tunneling Overhead

IPsec is an IETF protocol suite for secure transmission of IP packets. IPsec can operate in two modes: Transport mode or Tunnel mode. In Transport mode, encryption is applied only to the payload of the IP packet, whereas in Tunnel mode, encryption is applied to the whole IP packet, including the header. When the IP header is encrypted, the intermediate routers can no longer analyze and route the IP packet. Therefore, in Tunnel mode, the encrypted IP packet must be encapsulated in another IP packet, whose header is used for routing purposes. The new and extra header added in Transport mode means 20 extra bytes in overhead. In both Transport mode and Tunnel mode, either an Authentication Header (AH) or an Encapsulating Security Payload (ESP) header is added to the IP header. AH provides authentication only, whereas ESP provides authentication and encryption. As a result, ESP is used more often. AH, ESP, and the extra IP header of the Tunnel mode are the IPsec overheads to consider during VoIP bandwidth calculation. IPsec also adds extra delay to the packetization process at the sending and receiving ends.

Other common tunneling methods and protocols are not focused on security. IP packets or data link layer frames can be tunneled over a variety of protocols; the following is a short list of common tunneling protocols:

■ GRE—GRE transports Layer 3 (network layer) packets, such as IP packets, or Layer 2 (data link) frames, over IP.

■ Layer 2 Forwarding (L2F) and L2TP—L2F and L2TP transport PPP frames over IP.

■ PPP over Ethernet (PPPoE)—PPPoE transports PPP frames over Ethernet frames.

■ 802.1Q tunneling (QinQ)—An 802.1Q frame with multiple 802.1Q headers is called QinQ. Layer 2 switching engines forward the QinQ frame based on the VLAN number in the top 802.1Q header. When the top header is removed, forwarding of the frame based on the VLAN number in the lower 802.1Q header begins.

Whether one of the preceding tunneling protocols, IPsec in Tunnel mode, or any other tunneling protocol is used, the tunnel header is always present and is referred to as tunneling overhead. If any tunneling protocol is used, the tunneling overhead must be considered in VoIP bandwidth calculation. Table 1-8 shows the tunneling overhead—in other words, the tunnel header size—for a variety of tunneling options.

Table 1-8 IPsec and Main Tunneling Protocols Overheads


Header Size

IPsec Transport Mode

30 to 37 bytes

With ESP header utilizing DES or 3DES for encryption and MD5 or SHA-1 for authentication. (DES and 3DES require the payload size to be multiples of 8 bytes; therefore, 0 to 7 bytes padding may be necessary.)

IPsec Transport Mode

38 to 53 bytes

With ESP header utilizing AES for encryption and AES-XCBC for authentication. (AES requires the payload size to be multiples of 16 bytes; therefore, 0 to 15 bytes of padding might be necessary.)

IPsec Tunnel Mode

50 to 57 bytes

Extra 20 bytes must be added to the IPsec transport mode header size for the extra IP header in Tunnel mode


58 to 73 bytes


24 bytes


24 bytes


4 bytes


8 bytes

If a company connects two of its sites over the public Internet using IPsec in Tunnel mode (also called IPsec VPN), you must be able to calculate the total size of the IP packet encapsulating voice (VoIP). To do that, you need to know the codec used, the packetization period, and whether compressed RTP is used. The fictitious company under discussion uses the G.729 codec for site-to-site IP Telephony and a 20-ms packetization period (two 10-ms equivalent digital voice samples per packet); it does not utilize cRTP. For IPsec, assume tunnel mode with ESP header utilizing 3DES for encryption and SHA-1 for authentication. The voice payload size with G.729 and 20-ms packetization period will be 20 bytes. IP, UDP, and RTP headers add 40 bytes to the voice payload, bringing the total to 60 bytes. Because 60 is not a multiple of 8, 4 bytes of padding are added to bring the total to 64 bytes. Finally, the ESP header of 30 bytes and the extra IP header of 20 bytes bring the total packet size to 114 byes. The ratio of total IP packet size to the size of the voice payload is 114 over 20—more than 500 percent! Notice that without IPsec (in Tunnel mode), the total size of the IP packet (VoIP) would have been 60 bytes.

Calculating the Total Bandwidth for a VoIP Call

Calculating the bandwidth that a VoIP call consumes involves consideration for all the factors discussed thus far. Some fields and protocols are required, each of which might offer implementation alternatives. Other protocols and fields are optional. You use the bandwidth consumed by each VoIP call to calculate the total bandwidth required for the aggregate of simultaneous VoIP calls over LAN and WAN connections. This information is required for the following purposes:

■ Designing and planning link capacities

■ Deployment of CAC

■ Deployment of quality of service (QoS)

QoS can be defined as the ability of a network to provide services to different applications as per their particular requirements. Those services can include guarantees to control end-to-end delay, packet loss, jitter, and guaranteed bandwidth based on the needs of each application. CAC is used to control the number of concurrent calls to prevent oversubscription of the resources guaranteed for VoIP calls.

Computing the bandwidth consumed by a VoIP call involves six major steps:

Step 1 Determine the codec and the packetization period. Different codecs generate different numbers of bits per second (also called codec bandwidth), and they generally range from 5.3 Kbps to 64 Kbps. The number of digital voice samples (each of which is equivalent to 10 ms of analog voice) encapsulated in each IP packet determines the packetization period. A packetization period of 20 ms, which is the default in Cisco voice-enabled devices, means that each VoIP packet will encapsulate two 10-ms digital voice samples.

Step 2 Determine the link-specific information; this includes discovering whether cRTP is used and what the data link layer protocol (encapsulation type) is. You must also find out if any security or tunneling protocols and features are used on the link.

Step 3 Calculate the packetization size or, in other words, calculate the size of voice payload based on the information gathered in Step 1. Multiplying the codec bandwidth by the packetization period and dividing the result by 8 results in the size of voice payload in bytes. Please note that the packet-ization period is usually expressed in milliseconds, so you first must divide this number by 1000 to convert it to seconds. If G.729 with the codec bandwidth of 8 Kbps is used and the packetization period is 20 ms, the voice payload size will equal 20 bytes. 8000 (bps) multiplied by 0.020 (seconds) and divided by 8 (bits per byte) yields 20 bytes.

Step 4 Calculate the total frame size. Add the size of IP, UDP, and RTP headers, or cRTP header if applied, plus the optional tunneling headers and the data link layer header determined in Step 2, to the size of voice payload (packet-ization size) determined in Step 3. The result is the total frame size. If the voice payload size is 20 bytes, adding 40 bytes for RTP, UDP, and IP, and adding 6 bytes for PPP will result in a frame size of 66 bytes (without usage of cRTP and any tunneling or security features).

Step 5 Calculate the packet rate. The packet rate is inversed packetization period (converted to seconds). For example, if the packetization period is 20 ms, which is equivalent to 0.020 seconds, the packet rate is equal to 1 divided by 0.020, resulting in a packet rate of 50 packets per second (pps).

Step 6 Calculate the total bandwidth. The total bandwidth consumed by one VoIP call is computed by multiplying the total frame size (from step 4) converted to bits multiplied by the packet rate (from step 5). For instance, if the total frame size is 66 bytes, which is equivalent to 528 bits, and the packet rate is 50 pps, multiplying 528 by 50 results in a total bandwidth of 26400 bits per second, or 26.4 Kbps.

Figure 1-14 shows VoIP framing and two methods for computing the bandwidth required for a VoIP call. Method 1 displayed in Figure 1-14 is based on the six-step process just discussed.

The second method for calculating voice bandwidth is shown as Method 2 in Figure 1-14. This method is based on the ratio shown on the bottom of Figure 1-14: The ratio of total bandwidth over voice payload is equal to the ratio of total frame size over voice payload size. If G.729 is used and the packetization period is 20 milliseconds, the voice payload size will be 20 bytes. With PPP encapsulation and no cRTP, security, or tunneling, the total frame size adds up to 66 bytes. The ratio of total frame size to voice payload size is 66 over 20, which is equal to the ratio of voice bandwidth over codec bandwidth (8 Kbps for G.729). This 66 multiplied by 8 Kbps and divided by 20 results in voice bandwidth of 26.4 Kbps.

Figure 1-14 Computing the VoIP Bandwidth Requirement

Computing the VoIP Bandwidth Requirement

After you compute the bandwidth for one voice call, you can base the total bandwidth for VoIP on the maximum number of concurrent VoIP calls you expect or are willing to allow using CAC. The bandwidth required by VoIP and other applications (non-VoIP) added together generally should not exceed 75 percent of any bandwidth link. VoIP signaling also consumes bandwidth, but it takes much less bandwidth than actual VoIP talk (audio) packets. QoS tools and techniques treat VoIP signaling and VoIP data (audio) packets differently, so VoIP signaling bandwidth and QoS considerations need special attention.

Effects of VAD on Bandwidth

VAD is a feature that is available in voice-enabled networks. VAD detects silence (speech pauses) and one-way audio and does not generate data; as a result, it produces bandwidth savings. This does not happen in circuit-switched voice networks such as the PSTN, where a channel (usually a 64 Kbps DS0) is dedicated to a call regardless of the amount of activity on that circuit.

It is common for about one-third of a regular voice call to be silence; therefore, the concept of VAD for bandwidth saving is promising. One instance of a modern-day situation is when a caller is put on hold and listens to music on hold (MOH); in this situation, audio flows in one direction only, and it is not necessary to send data from the person on hold to anywhere.

The amount of bandwidth savings experienced based on VAD depends on the following factors:

■ Type of audio—During a regular telephone call, only one person speaks at a time (usually!); therefore, no data needs to be sent from the silent party toward the speaking party. The same argument applies when a caller is put on hold or when the person gets MOH.

■ Background noise level—If the background noise is too loud, VAD does not detect silence and offers no savings. In other words, the background noise is transmitted as regular audio.

■ Other factors—Differences in language and culture and the type of communication might vary the amount of bandwidth savings due to VAD. During a conference, or when one person is lecturing other(s), the listeners remain silent, and VAD certainly takes advantage of that.

Studies have shown that even though VAD can produce about 35 percent bandwidth savings, its results depend heavily on the fore-mentioned factors. The 35 percent bandwidth savings is based on distribution of different call types; this is only realized if at least 24 active voice calls are on a link. If you expect fewer than 24 calls, the bandwidth savings due of VAD should not be included in the bandwidth calculations. Most conservative people do not count on the VAD savings; in other words, even though they use the VAD feature, they do not include the VAD bandwidth savings in their calculations.

Next post:

Previous post: