VoIP Fundamentals (Introducing Voice over IP Networks) Part 2

PRI/BRI Backhaul

A Primary Rate Interface (PRI) and Basic Rate Interface (BRI) backhaul is an internal interface between the call agent (such as Cisco UCM) and Cisco gateways. It is a separate channel for backhauling signaling information. A PRI backhaul forwards PRI Layer 3 (Q.931) signaling information via a TCP connection.

An MGCP gateway is relatively easy to configure. Because the call agent has all the call-routing intelligence, you do not need to configure the gateway with all the dial peers it would otherwise need. A downside is that a call agent must always be available. Cisco MGCP gateways can use Survivable Remote Site Telephony (SRST) and MGCP fallback to allow the H.323 protocol to take over and provide local call routing in the absence of a Communications Manager (for example, during a WAN outage). In that case, you must configure dial peers on the gateway for use by H.323.

Session Initiation Protocol

SIP is a protocol developed by the Internet Engineering Task Force (IETF) Multiparty Multimedia Session Control (MMUSIC) Working Group as an alternative to H.323. SIP features are compliant with IETF RFC 2543, published in March 1999; RFC 3261, published in June 2002; and RFC 3665, published in December 2003. Because SIP is a common standard based on the logic of the World Wide Web and is very simple to implement, it is widely used with gateways and proxy servers within service provider networks for internal and end-customer signaling.

SIP is a peer-to-peer protocol where user agents (UAs) initiate sessions, similar to H.323. However, unlike H.323, SIP uses ASCII-text-based messages to communicate. Therefore, you can implement and troubleshoot SIP very easily.

Because SIP is a peer-to-peer protocol, the Cisco UCM does not control SIP devices, and SIP devices do not register with Cisco UCM. As with H.323 gateways, only the IP address is available on Cisco UCM to confirm that communication between a Cisco UCM and a SIP voice gateway is possible.

Skinny Client Control Protocol

SCCP is a Cisco proprietary protocol that is used for the communication between Cisco UCM and terminal endpoints. SCCP is a client-server protocol, meaning any event (such as on-hook, off-hook, or buttons pressed) causes a message to be sent to a Cisco UCM. Cisco UCM then sends specific instructions back to the device to tell it what to do about the event. Therefore, each press on a phone button causes data traffic between Cisco UCM and the terminal endpoint. SCCP is widely used with Cisco IP Phones. The major advantage of SCCP within Cisco UCM networks is its proprietary nature, which allows you to make quick changes to the protocol and add features and functionality.

SCCP is a simplified protocol used in VoIP networks. Cisco IP Phones that use SCCP can coexist in an H.323 environment. When used with Cisco Communications Manager, a SCCP client can interoperate with H.323-compliant terminals.

Comparing VoIP Signaling Protocols

The primary goal for all four of the previously mentioned VoIP signaling protocols is the same—to create a bidirectional Real-time Transport Protocol (RTP) stream between VoIP endpoints involved in a conversation. However, VoIP signaling protocols use different architectures and procedures to achieve this goal.


H.323 is considered a peer-to-peer protocol, although H.323 is not a single protocol. Rather, it is a suite of protocols. The necessary gateway configuration is relatively complex, because you need to define the dial plan and route patterns directly on the gateway. Examples of H.323-capable devices are the Cisco VG224 Analog Phone Gateway and the Cisco 2600XM Series, Cisco 2800 Series, 3700 Series, and 3800 Series routers.

The H.323 protocol is responsible for all the signaling between a Cisco UCM cluster and an H.323 gateway. The ISDN protocols, Q.921 and Q.931, are used only on the Integrated Services Digital Network (ISDN) link to the PSTN, as illustrated in Figure 1-2.

 H.323 Signaling MGCP

Figure 1-2 H.323 Signaling MGCP

The MGCP protocol is based on a client/server architecture. That simplifies the configuration because the dial plan and route patterns are defined directly on a Cisco UCM server within a cluster. Examples of MGCP-capable devices are the Cisco VG224 Analog Phone Gateway and the Cisco 2600XM Series, 2800 Series, 3700 Series, and 3800 Series routers. Non-IOS MGCP gateways include the Cisco Catalyst 6608-E1 and Catalyst 6608-T1 module.

MGCP is used to manage a gateway. All ISDN Layer 3 information is backhauled to a Cisco UCM server. Only the ISDN Layer 2 information (Q.921) is terminated on the gateway, as depicted in Figure 1-3.

MGCP Signaling

Figure 1-3 MGCP Signaling


Like the H.323 protocol, the SIP is a peer-to-peer protocol. The configuration necessary for the gateway is relatively complex because the dial plan and route patterns need to be defined directly on the gateway. Examples of SIP-capable devices are the Cisco 2800 Series and 3800 Series routers.

The SIP protocol is responsible for all the signaling between a Cisco UCM cluster and a gateway. The ISDN protocols, Q.921 and Q.931, are used only on an ISDN link to the PSTN, as illustrated in Figure 1-4.

SIP Signaling SCCP

Figure 1-4 SIP Signaling SCCP

SCCP works in a client/server architecture, as shown in Figure 1-5, which simplifies the configuration of SCCP devices such as Cisco IP Phones and Cisco ATA 180 Series and VG200 Series FXS gateways.

SCCP Signaling

Figure 1-5 SCCP Signaling

SCCP is used on Cisco VG224 and VG248 analog phone gateways. ATAs enable communications between Cisco UCM and a gateway. The gateway then uses standard analog signaling to an analog device connected to the ATA’s FXS port. Recent versions of Cisco IOS voice gateways—for example, the 2800 series—also support SCCP controlled Foreign Exchange Station (FXS) ports.

VoIP Service Considerations

In traditional telephony networks, dedicated bandwidth for each voice stream provides voice with a guaranteed delay across the network. Because bandwidth is guaranteed in a TDM environment, no variable delay exists (that is, jitter). Configuring voice in a data network requires network services with low delay, minimal jitter, and minimal packet loss. Bandwidth requirements must be properly calculated based on the codec used and the number of concurrent connections. QoS must be configured to minimize jitter and loss of voice packets. The PSTN provides 99.999 percent availability (that is, the five nines of availability). To match the availability of the PSTN, an IP network must be designed with redundancy and failover mechanisms. Security policies must be established to address both network stability and voice-stream security.

Table 1-1 lists issues associated with implementing VoIP in a converged network and solutions that address these issues.

Table 1-1 Issues and Solutions for VoIP in a Converged Network




Increase bandwidth.

Choose a different codec type.

Fragment data packets.

Prioritize voice packets.


Use dejitter buffers.

Prioritize voice packets.


Calculate bandwidth requirements, including voice payload, overhead, and data.

Packet loss

Design the network to minimize congestion.

Prioritize voice packets.

Use codecs to minimize small amounts of packet loss.


Provide redundancy for hardware, links, and power (uninterruptible power supply [UPS]).

Perform proactive network management.


Secure the following components:

■ Network infrastructure

■ Call-processing systems

■ Endpoints

■ Applications

Media Transmission Protocols

In a VoIP network, the actual voice data (conversations) are transported across the transmission media using RTP and RTP Control Protocol (RTCP). RTP defines a standardized packet format for delivering audio and video over the Internet. RTCP is a companion protocol to RTP as it provides for the delivery of control information for individual RTP streams. Compressed Real-time Transport Protocol (cRTP) and Secure Real-time Transport Protocol (sRTP) were developed to enhance the usage of RTP.

Datagram protocols, such as UDP, send a media stream as a series of small packets. This approach is simple and efficient. However, packets are liable to be lost or corrupted in transit. Depending on the protocol and the extent of the loss, a client might be able to recover lost data with error correction techniques, might interpolate over the missing data, or might suffer a data dropout. RTP and the RTCP were specifically designed to stream media over networks. They are both built on top of UDP.

Real-Time Transport Protocol

RTP defines a standardized packet format for delivering audio and video over the Internet. It was developed by the Audio-Video Transport Working Group of the IETF and was first published in 1996 as RFC 1889, which was made obsolete in 2003 by RFC 3550.

RTP provides end-to-end network transport functions intended for applications with realtime transmission requirements, such as audio and video. Those functions include payload-type identification, sequence numbering, time stamping, and delivery monitoring. Figure 1-6 shows a typical role played by RTP in a VoIP network. Specifically, notice RTP communicates directly between the voice endpoints, whereas the call setup protocols (that is, H.225 and H.245 in this example) are used to communicate with voice gateways.

Role of RTP

Figure 1-6 Role of RTP

RTP typically runs on top of UDP to use the multiplexing and checksum services of that protocol. RTP does not have a standard TCP or UDP port on which it communicates. The only standard it obeys is that UDP communications are done via an even port, and the next higher odd port is used for RTCP communications. Although no standards are assigned, in a Cisco environment RTP is generally configured to use UDP ports in the range 16,384-32,767.

RTP can carry any data with real-time characteristics, such as interactive audio or video. The fact that RTP uses a dynamic port range can make it difficult for it to traverse firewalls.

Although RTP is often used for unicast sessions, it is primarily designed for multicast sessions. In addition to the roles of sender and receiver, RTP defines the roles of translator and mixer to support multicast requirements.

RTP is frequently used in conjunction with Real-time Streaming Protocol (RTSP) in streaming media systems. RTP is also used in conjunction with H.323 or SIP in videoconferencing and push-to-talk systems. These two characteristics make RTP the technical foundation of the VoIP industry. Applications using RTP are less sensitive to packet loss, but typically very sensitive to delays, so UDP is a better choice than TCP for such applications.

RTP is a critical component of VoIP because it enables the destination device to reorder and retime the voice packets before they are played out to the user. An RTP header contains a time stamp and sequence number, which allow the receiving device to buffer and to remove jitter by synchronizing the packets to play back a continuous stream of sound. RTP uses sequence numbers only to order the packets. RTP does not request retransmission if a packet is lost.

RTP Control Protocol

RTCP is a sister protocol of RTP. It was first defined in RFC 1889 and was made obsolete by RFC 3550. RTP provides out-of-band control information for an RTP flow. It works alongside RTP in the delivery and packaging of multimedia data, but does not transport any data itself. Although RTCP is periodically used to transmit control packets to participants in a streaming multimedia session, the primary function of RTCP is to provide feedback on the quality of service being provided by RTP.

RTCP is used for QoS reporting. It gathers statistics on a media connection and information such as bytes sent, packets sent, lost packets, jitter, feedback, and round-trip delay. Applications use this information to increase the quality of service, perhaps using a low-compression codec instead of a high-compression codec.

There are several types of RTCP packets: Sender Report Packet, Receiver Report Packet, Source Description RTCP Packet, Goodbye RTCP Packet, and application-specific RTCP packets.

RTCP provides the following feedback on current network conditions:

■ RTCP provides a mechanism for hosts involved in an RTP session to exchange information about monitoring and controlling the session. RTCP monitors the quality of elements such as packet count, packet loss, delay, and interarrival jitter. RTCP transmits packets as a percentage of session bandwidth, but at a specific rate of at least every five seconds.

■ The RTP standard states that the Network Time Protocol (NTP) time stamp is based on synchronized clocks. The corresponding RTP time stamp is randomly generated and based on data packet sampling. Both NTP and RTP are included in RTCP packets by the sender of the data.

■ RTCP provides a separate flow from RTP. When a voice stream is assigned UDP port numbers, RTP is typically assigned an even-numbered port and RTCP is assigned the next odd-numbered port. Each voice call has four ports assigned: RTP plus RTCP in the transmit direction and RTP plus RTCP in the receive direction.

Compressed RTP

RTP includes a data portion and a header portion. The data portion of RTP is a thin protocol that provides support for the real-time properties of applications, such as continuous media, including timing reconstruction, loss detection, and content identification. The header portion of RTP is considerably larger than the data portion. The header portion consists of the IP segment, the UDP segment, and the RTP segment. Given the size of the IP/UDP/RTP segment combinations, it is inefficient to send the IP/UDP/RTP header without compressing it. Figure 1-7 illustrates using RTP header cRTP over a relatively low-speed WAN link (such as a T1 link), which could benefit from the bandwidth freed up by compressing the IP/UDP/RTP header.

RTP Header Compression

Figure 1-7 RTP Header Compression

The IP header portion consists of an IP segment, a UDP segment, and an RTP segment. The minimal 20 bytes of the IP segment, combined with the 8 bytes of the UDP segment and the 12 bytes of the RTP segment, create a 40-byte IP/UDP/RTP header. The RTP packet has a payload of approximately 20 to 150 bytes for audio applications that use compressed payloads.

The RTP header compression feature compresses the IP/UDP/RTP header in an RTP data packet from 40 bytes to approximately 2 to 4 bytes.

cRTP, specified in RFCs 2508, 2509, and 3545, was developed to decrease the size of the IP, UDP, and RTP headers.

■ RFC 2508: Compressing IP/UDP/RTP Headers for Low-Speed Serial Links

■ RFC 2509: IP Header Compression over PPP

■ RFC 3545: Enhanced Compressed RTP (ECRTP) for Links with High Delay, Packet Loss and Reordering

RFC 2509 was designed to work with reliable and fast point-to-point links. In less than optimal circumstances, where there might be long delays, packet loss, and out-of-sequence packets, cRTP doesn’t function well for VoIP applications. Another adaptation, ECRPT, was defined in a subsequent Internet draft document to overcome that problem.

RTP header compression is supported on serial lines using Frame Relay, HDLC, or PPP encapsulation. It is also supported over ISDN interfaces.

Why and When to Use cRTP

cRTP does not technically perform compression. Rather, cRTP leverages the fact that much of the header information in every packet in a VoIP stream contains redundant information, and cRTP then suppresses the sending of that redundant information. For example, after a VoIP call flow is established, every packet has the same source and destination IP addresses, the same source and destination UDP port numbers, and the same RTP payload type. By caching this redundant information in the gateways at each end of a link, sending reduced headers, and then reassembling the full header, cRTP can achieve significant bandwidth savings without any loss of information.

RTP header compression also reduces overhead for multimedia RTP traffic. The reduction in overhead for multimedia RTP traffic results in a corresponding reduction in delay. RTP header compression is especially beneficial when the RTP payload size is small; for example, for compressed audio payloads of 20 to 50 bytes.

Use RTP header compression on any WAN interface where you are concerned about bandwidth and where there is a high portion of RTP traffic. RTP header compression can be used for media-on-demand and interactive services such as Internet telephony. RTP header compression provides support for real-time conferencing of groups of any size within the Internet. This support includes source identification support for gateways such as audio and video bridges and support for multicast-to-unicast translators. RTP header compression can benefit both telephony voice and multicast backbone (MBONE) applications running over slow links.

Note Using RTP header compression on any high-speed interfaces (that is, anything over T1 speed) is not recommended. Any bandwidth savings achieved with RTP header compression might be offset by an increase in CPU utilization on the router.

Secure RTP

sRTP was first published by IETF in March 2004 as RFC 3711; it was designed to provide encryption, message authentication, and integrity, and replay protection to RTP data in both unicast and multicast applications.

sRTP also has a sister protocol, called Secure RTCP (sRTCP). sRTCP provides the same security-related features to RTCP as the ones provided by sRTP to RTP. sRTP can be used in conjunction with compressed RTP. Figure 1-8 demonstrates that an sRTP flow travels between devices (Cisco IP phones in Figure 1-8), which are capable of sending and receiving sRTP traffic.

Secure RTP Traffic Flow Flow Encryption

Figure 1-8 Secure RTP Traffic Flow Flow Encryption

sRTP standardizes utilization of only a single cipher, Advanced Encryption Standard (AES), which can be used in two cipher modes, which turn the original block AES cipher into a stream cipher:

■ Segmented Integer Counter Mode: A counter mode that allows random access to any blocks and that is essential for RTP traffic running over unreliable networks with possible loss of packets. AES running in this mode is the default encryption algorithm, with a default encryption key length of 128 bits and a default session salt key length of 112 bits.

■ f8-mode: A variation of output feedback mode. The default values of the encryption key and salt key are the same as for AES in Counter Mode.

In addition to the AES cipher, sRTP gives the user the ability to disable encryption outright, using the so called NULL cipher. However, the NULL cipher does not perform any encryption. Rather, the encryption algorithm functions as though the key stream contains only zeroes, and it copies the input stream to the output stream without any changes.

Note It is mandatory for the NULL cipher mode to be implemented in any sRTP-compatible system. As such, it can be used when the confidentiality guarantees ensured by sRTP are not required, and other sRTP features (such authentication and message integrity) might be used.

Because encryption algorithms do not secure message integrity themselves, allowing the attacker to either forge the data or at least to replay previously transmitted data, sRTP also provides the means to secure the integrity of data and safety from replay.

Authentication and Integrity

The HMAC-SHA1 algorithm (defined in RFC 2104) is used to authenticate a message and protect its integrity. This algorithm produces a 160-bit result, which is then truncated to 80 bits to become the authentication tag, which is then appended to a packet. The HMAC is calculated over the packet payload and material from the packet header, including the packet sequence number.

Replay Protection

To protect against replay attacks, a receiver must maintain the indices of previously received messages, comparing them with the index of each newly received message and admitting the new message only if it has not been played before. Such an approach heavily relies on integrity protection being enabled (to make it nearly impossible to spoof message indices).

Next post:

Previous post: