VoIP Fundamentals (Considering VoIP Design Elements) Part 2

Audio Quality Measurement

Several methods can be used to determine signal quality, including the following:

■ Mean Opinion Score (MOS)

■ Perceptual Speech Quality Measurement (PSQM)

■ Perceptual Evaluation of Speech Quality (PESQ)


MOS is a scoring system for voice quality. A MOS score is generated when listeners evaluate prerecorded sentences that are subject to varying conditions, such as compression algorithms. Listeners then assign the sentences values, based on a scale from 1 through 5, where 1 is the worst and 5 is the best. The sentence used for English-language MOS testing is, "Nowadays, a chicken leg is a rare dish." This sentence is used, because it contains a wide range of sounds found in human speech, such as long vowels, short vowels, hard sounds, and soft sounds.

The test scores are then averaged to a composite score. The test results are subjective, because they are based on the opinions of the listeners. The tests are also relative because a score of 3.8 from one test cannot be directly compared to a score of 3.8 from another test. Therefore, you must establish a baseline for all tests, such as G.711, so the scores can be normalized and compared directly.


PSQM is an automated method of measuring speech quality "in service," or as the speech happens. PSQM software usually resides with IP call management systems, which are sometimes integrated into Simple Network Management Protocol (SNMP) systems.

Equipment and software that can measure PSQM are available through third-party vendors; it is not implemented in Cisco equipment. The measurement is made by comparing the original transmitted speech to the resulting speech at the far end of the transmission channel. PSQM systems are deployed as in-service components. The PSQM measurements are made during real conversation on the network. This automated testing algorithm has over 90 percent accuracy compared to subjective listening tests, such as MOS. Scoring is based on a scale from 0 through 6.5, where 0 is the best and 6.5 is the worst. Because it was originally designed for circuit-switched voice, PSQM does not take into account the jitter or delay problems experienced in packet-switched voice systems.


PESQ was specifically developed to be applicable to end-to-end voice-quality testing under real network conditions, like VoIP, Plain old telephone service (POTS), Integrated services digital network (ISDN), and Global System for Mobile Communication (GSM). PESQ was developed by KPN Research (now TNO Telecom), the Netherlands, and British Telecommunications (BT), by combining the two advanced speech quality measures PSQM+ and Perceptual Analysis Measurement System (PAMS).

PESQ, as demonstrated in Figure 2-4, has evolved into ITU-T Recommendation P.862, which is considered the current standard for voice-quality measurement. PESQ can take into account codec errors, filtering errors, jitter problems, and delay problems that are typical in a VoIP network. It combines the best of the PSQM method along with a method called PAMS. PESQ scores range from 1 (worst) through 4.5 (best), with 3.8 considered toll quality (acceptable quality in a traditional telephony network). PESQ is meant to measure only one aspect of voice quality. The effects of two-way communication, such as loudness loss, delay, echo, and side tone, are not reflected in PESQ scores.

PESQ Operation

Figure 2-4 PESQ Operation

Voice-Quality Measurement Comparison

Early quality measurement methods, such as MOS and PSQM, were designed before widespread acceptance of VoIP technology. PESQ was designed to address the shortcomings of MOS and PSQM.

MOS uses subjective testing in which the average opinion of a group of test users is calculated to create the MOS score. This method is both time-consuming and expensive and might not provide consistent results between groups of testers.

PSQM and PESQ use objective testing in which an original reference file sent into the system is compared with the impaired signal that came out. This testing method provides an automated test mechanism that does not rely on human interpretation for result calculations. However, PSQM was originally designed for circuit-switched networks and does not take into account the effects of jitter and packet loss.

PESQ measures the effect of end-to-end network conditions, including codec processing, jitter, and packet loss. Therefore, PESQ is the preferred method of testing voice quality in an IP network. Table 2-3 offers a comparison of the features offered by MOS, PSQM, and PESQ.

Table 2-3 Quality Measurement Comparison





Test method




End-to-end packet loss test




End-to-end jitter test




VoIP and QoS

Real-time applications, such as voice applications, have different characteristics and requirements from those of traditional data applications. Because they are real-time based, voice applications tolerate minimal variation in the amount of delay affecting delivery of their voice packets. Voice traffic is also intolerant of packet loss and jitter, both of which unacceptably degrade the quality of the voice transmission delivered to the recipient end user. To effectively transport voice traffic over IP, mechanisms are required that ensure reliable delivery of voice packets. Cisco IOS QoS features collectively embody these techniques, offering the means to provide priority service that meets the stringent requirements of voice packet delivery.

The QoS components for Cisco Unified Communications are provided through the IP traffic management, queuing, and shaping capabilities of a Cisco IP network infrastructure.

Following are a few of the Cisco IOS features that address the requirements of end-to-end QoS and service differentiation for voice packet delivery:

■ Header Compression: Used in conjunction with Real-time Transport Protocol (RTP) and Transmission Control Protocol (TCP), it compresses the extensive RTP or TCP header, resulting in decreased consumption of available bandwidth for voice traffic. A corresponding reduction in delay is realized.

Frame Relay Traffic Shaping (FRTS): Delays excess traffic using a buffer or queuing mechanism to hold packets and shape the flow when the data rate of the source is higher than expected.

■ FRF.12 (and Higher): Ensures predictability for voice traffic, aiming to provide better throughput on low-speed Frame Relay links by interleaving delay-sensitive voice traffic on one virtual circuit (VC) with fragments of a long frame on another VC utilizing the same interface.

■ Public Switched Telephone Network (PSTN) Fallback: Provides a mechanism to monitor congestion in the IP network and either redirect calls to the PSTN or reject calls based on the network congestion.

■ IP RTP Priority and Frame Relay IP RTP Priority: Provides a strict priority queuing scheme that allows delay-sensitive data, such as voice, to be dequeued and sent before packets when other queues are dequeued. These features are especially useful on slow-speed WAN links, including Frame Relay, Multilink PPP [MLP], and T1 ATM links. It works with weighted fair queuing (WFQ) and Class-Based WFQ (CBWFQ).

■ IP to ATM Class of Service (CoS): Includes a feature suite that maps QoS characteristics between IP and ATM. Offers differential service classes across the entire WAN, not just the routed portion. Gives mission-critical applications exceptional service during periods of high network usage and congestion.

■ Low Latency Queuing (LLQ): Provides strict priority queuing on ATM VCs and serial interfaces. This feature enables you to configure the priority status for a class within CBWFQ and is not limited to User Datagram Protocol (UDP) port numbers, as is IP RTP Priority.

■ MLP: Allows large packets to be multilink encapsulated and fragmented so they are small enough to satisfy the delay requirements of real-time traffic. MLP also provides a special transmit queue for smaller, delay-sensitive packets, enabling them to be sent earlier than other flows.

Resource Reservation Protocol (RSVP): Supports the reservation of resources across an IP network, allowing end systems to request QoS guarantees from the network. For networks supporting VoIP, RSVP (in conjunction with features that provide queuing, traffic shaping, and voice call signaling) can provide call admission control (CAC) for voice traffic. Cisco also provides RSVP support for LLQ and Frame Relay.

Objectives of QoS

To ensure VoIP is an acceptable replacement for standard PSTN telephony services, customers must receive the same consistently high quality of voice transmission they receive with basic telephone services. Like other real-time applications, VoIP is extremely sensitive to issues related to bandwidth and delay. To ensure VoIP transmissions are intelligible to the receiver, voice packets cannot be dropped, excessively delayed, or be subject to variations in delay (jitter). A successful VoIP deployment must provide an acceptable level of voice quality by meeting VoIP traffic requirements for issues related to bandwidth, latency, and jitter.

QoS refers to the ability of a network to provide improved service to selected network traffic over various underlying technologies including Frame Relay, ATM, Ethernet and 802.1 networks, SONET, and IP-routed networks. VoIP guarantees high-quality voice transmission only if the signaling and audio channel packets have priority over other kinds of network traffic.

In particular, QoS features provide improved and more predictable network service by implementing the following services:

■ Support guaranteed bandwidth: Designing the network so the necessary bandwidth is always available to support voice and data traffic

■ Improve loss characteristics: Designing the Frame Relay network, for example, so discard eligibility is not a factor for frames containing voice, keeping voice below the committed information rate (CIR)

■ Avoid and manage network congestion: Ensuring the LAN and WAN infrastructure can support the volume of data traffic and voice calls

■ Shape network traffic: Using Cisco traffic-shaping tools to ensure smooth and consistent delivery of frames to the WAN

■ Set traffic priorities across the network: Marking voice traffic as priority and queuing it first

Using QoS to Improve Voice Quality

Voice features that provide QoS are deployed at different points in the network and designed for use with other QoS features to achieve specific goals, such as minimization of jitter and delay.

Cisco IOS Software includes a complete set of features for delivering QoS throughout the network. Although a complete survey of QoS features is beyond the scope of this topic, Cisco’s recommended QoS mechanism for VoIP queuing, in a router’s output interface, is LLQ.

LLQ provides strict priority queuing (PQ) in conjunction with CBWFQ. LLQ configures the priority status for a class within CBWFQ, in which voice packets receive priority over all other traffic.

For example, consider Figure 2-5. Whereas web traffic receives at least 128 kbps of bandwidth (if the web traffic needs that much bandwidth), voice traffic receives 256 kbps of "priority" bandwidth (if the voice traffic needs that much bandwidth), meaning the voice traffic is transmitted first, ahead of the web traffic. However, the voice traffic will not starve out the other traffic types, because the voice traffic is also limited to consuming no more than 256 kbps.

Low Latency Queuing Example

Figure 2-5 Low Latency Queuing Example

Transporting Modulated Data over IP Networks

An IP, or packet-switched, network enables data to be sent in packets to remote locations. The data is assembled by a packet assembler/disassembler (PAD) into individual packets of data, involving a process of segmentation or subdivision of larger sets of data as specified by the native protocol of the sending device. Each packet has a unique identifier that makes it independent and has its own destination address. Because the packet is unique and independent, it can traverse the network in a stream of packets and use different routes. This has some implications for fax transmissions that use data packets rather than using an analog signal over a circuit-switched network.

Differences from Fax Transmission in the PSTN

In IP networks, individual packets that are part of the same data transmission might follow different physical paths of varying lengths. They can also experience varying levels of propagation delay and delay that is caused by being held in packet buffers awaiting the availability of a subsequent circuit. The packets can also arrive in an order different from the order in which they entered the network. The destination node of the network uses the identifiers and addresses in the packet sequencing information to reassemble the packets into the correct sequence.

Fax transmissions are designed to operate across a 64 kbps pulse code modulation (PCM) encoded voice circuit, but in packet networks, the 64 kbps stream is often compressed into a much smaller data rate by passing it through a DSP. The codecs normally used to compress a voice stream in a DSP are designed to compress and decompress human speech, not fax or modem tones. For this reason, faxes and modems are rarely used in a VoIP network without some kind of relay or pass-through mechanism in place.

Fax Services over IP Networks

There are three conceptual methods of carrying fax-machine-to-fax-machine communications across packet networks:

■ Fax relay: The T.30 fax from the PSTN is demodulated at the sending gateway. The demodulated fax content is enveloped into packets, sent over the network, and remodulated into T.30 fax at the receiving end.

Note Cisco IOS supports two types of fax relay: T.38 fax relay and Cisco Fax Relay (which is proprietary).

■ Fax pass-through: Modulated fax information from the PSTN is passed in-band end-to-end over a voice speech path in an IP network. There are two pass-through techniques:

■ The configured voice codec is used for the fax transmission. This technique works only when the configured codec is G.711 with no VAD and no echo cancellation (EC) or when the configured codec is a clear-channel codec or G.726/32. Low bit-rate codecs cannot be used for fax transmissions.

■ The gateway dynamically changes the codec from the codec configured for voice to G.711 with no VAD and no EC for the duration of the fax session. This method is specifically referred to as "codec up speed" or "fax pass-through with up speed."

■ Store-and-forward fax: Breaks the fax process into distinct sending and receiving processes and allows fax messages to be stored between those processes. Store-and-forward fax is based on the ITU-T T.37 standard, and it also enables fax transmissions to be received from or delivered to computers rather than fax machines.

Next post:

Previous post: