VoIP VOICE QUALITY CONSIDERATIONS

20.3
A VoIP solution supports full-chain voice and fax modules as explained in the previous topics. Many of these algorithms and modules are based on several
standards. In VoIP solutions, the main voice quality goals would be to match or exceed PSTN quality. To achieve this goal, several proprietary solutions are incorporated into voice processing. Major influencing items can be analyzed through the voice quality measuring models like the E-model, PESQ, and perceptions based on subjective listening. In VoIP, some critical areas considered for voice quality contribution are given below [Alexander (2006), Bellamy (1991), ITU-T-G.1020 (2006), ITU-T-G.177 (1999), TIA/EIA-810A (2000), TIA/EIA-116A (2006)]. Some of these aspects are also covered in the previous topics of this topic while presenting an individual operation like codec, echo canceller, VAD/CNG, PLC, jitter buffer, and so on. A summary of the major voice quality influencing operations and improvements is given as follows:
• End-to-end delay reduction
• Packet flow impediments in processing
• Adaptive jitter buffer with utilization of silence zones
• Packet loss concealment
• Echo cancellation
• Voice compression codecs
• Narrowband coding
• Wideband coding
• Transcoding tandem operation of codecs
• Codecs and congestion
• Country-specific deviations
• Signal transmission characteristics
• Transmission loss planning
• Subscriber line interface circuit (SLIC)-CODEC interfaces and configurations
• Dual-tone multifrequency (DTMF) rejection as annoyance
• Quality-of-service (QoS) considerations
• GR-909 telephone interface diagnostics
• Voice quality monitoring and RTCP-XR
• Miscellaneous points of voice quality
20.3.1

End-to-End Delay Reduction

As per G.114 [ITU-T-G.114 (2003a)] recommendation, end-to-end delay between talkers to the listener has to be less than 150 ms. End-to-end delay is also known by several other names such as one- way delay, mean one- way
delay, and half of round-trip delay. The TIA/EIA-116A document [TIA/EIA-
116A (2006)] suggests containing delays to less than 100ms. The literature [URL (Cisco-delay)] has provided some more detailed breakup on these
delays. For inter-regional (international) calls, a delay of 300 ms is considered with an upper limit of 400 ms. Many adaptive jitter buffer algorithms use 400 ms as the default upper boundary. In the previous sections of the E-model, it is indicated that 177.3 ms will be talk turn time. Keeping delays below 177.3ms prevents frequent double-talk situations.
In qualitative terms, an increase in delays will have several bad quality effects and will not allow the voice conversion with the proper level of emotion. The delayed responses are treated with an element of doubt and as unwilling decisions. More delay locks the talker, and other listeners have to interrupt for initiating the conversation.
Several parameters contribute to an increase in end-to-end delays, such as the algorithms, software implementation, processor architectures, processing power, interprocessor and interfaces communication, codecs and packetiza-tion, number of channels processing, other concurrent applications, end-to-end network conditions, physical interfaces, QoS mechanisms-fragmentation of large packets, jitter buffer designs, sampling clock precisions, as well as time-stamp resolution and accuracy. PSTN voice calls operate with minimum delays. VoIP calls typically take 60 to 80 ms more than PSTN calls in good implementations, packetization of 10 to 20 ms, and good network conditions. In VoIP, the effort would be to achieve as low a delay as possible to mimic a PSTN-based call. In practice, delay reduction achieved during the process of increasing competition, which provides better voice quality in comparison with the existing deployments. A limit on the amount of reduction will be reached based on the physical distance issue.
From codec selection, waveform-based or sample-based speech codecs such as G.711, G.726, and G.722 allow for lower delays to be achieved. In VoIP, packets based on small compressed frames will not use Internet bandwidth efficiently. A small packet size of 5, 10, or 20 ms is desired to maintain lower delays and balance on bandwidth utilization. The codec selection and several trade-offs are given in topics 3 and 11.
Incremental delay reductions are difficult to achieve. VoIP systems should cater to lower delays by design and while provisioning the service. Internet service providers and VoIP service providers have to ensure proper low delay and minimal IP impediments to the delivery of voice packets. While writing this topic, intraregional VoIP calls achieve end-to-end delays in the order of 50 to 80 ms.
Echo will be the other main voice- degrading item coupled closely with delay. A increase in delay always creates more echo complaints. The design of the echo canceller gets more stringent with more round- t rip delays, which depends on the TELR arrived at in the end-to-end voice call. Good phones and allowed loss settings will accept delays up to 25 ms [ITU-T-G.131 (2003)] without calling for echo cancellers. To arrive at proper numbers and arithmetic on this echo-delay decision boundary, refer to the mean one-way delay-TELR graph given in the G.131 recommendation and in topic 6 of this topic. In a high-level summary, it is preferred to maintain a one-way delay of 50 to 100 ms.
Voice quality will be under the degradation zone of quality with delays exceeding 150 to 200 ms.
End-to-End Delay Calculation Example with G.729-20-ms Packetization. End-to-end delays are represented in Fig. 20.4 considering the G.729 codec with 20-ms packetization as well as some of the main delay contributors. Voice entering the telephone SLIC-CODEC interface goes through a 1- to 2-ms delay. The G.729 codec processes on 10-ms frames (marked as F0, F1 . . . F9); hence, samples of a 10-ms duration have to be collected before processing in the encoder. It is also called PCM frame delay. The G.729 encoder has a 5-ms look-ahead delay, which causes an equivalent delay of 15 ms to start processing for G.729. Each frame in the encoder takes a few ms for processing, which is represented as the “encoder-processing delay,” and the compressed frames are marked as P0, P1 . . . P8. In 20-ms RTP packetization, P0 + P1, P2 + P3 compressed payloads are used in creating the final RTP-based VoIP packet. About a 5-ms delay may occur, between the end of the compressed payload and the beginning of sending the packets on the IP network interface marked as “packetization + interface queuing delays.” The 20-ms packets composed of two 10-ms payloads are sent on the physical network. The packets are transmitted to the destination via the network. For the local region, this transmission is expected to go through a delay of about 15 ms. In this situation, packet impediments are assumed to be creating lower than 20-ms jitter or no jitter. Jitter buffers typically keep 20- to 50-ms minimum packets even under no network impediment conditions. In Fig. 20.4, 20 ms is considered a minimum jitter buffer delay. At the decoder, jitter buffer output will go through the

Figure 20.4. End-to-end delay examples for G.729 codec with 20-ms packetization.
decoder and will be played on the telephone interface, and this delay is accounted as 5 ms. The overall delay is marked as 76 ms in this example. For 10-ms based IP packets, this delay will be 66 ms, and 40-ms packets can take 96 ms as per this example diagram. The delays may vary based on the processor architecture and delay estimates at various stages of processing.
Conference Call Delays with an Example. Voice conferencing can degrade voice quality because of transcoding and increased end-to-end delays. As illustrated in Fig. 20.5(a), in a conference call among users A, B, and C, user C is hosting the conference. Voice samples started from A or B should be routed through the conference combining bridge; in Fig. 20.5(a), the conference mixing is shown closer to user C. Hence, for a conversation between A and B, voice samples from A first go to the conference combining bridge present near C and then they reach B, which includes A-C and C-B delay. In many situations, the delay in conferencing is perceived as two times the long delay. In Fig. 20.5(b), the conference combining is occurring between two users A and B. In this mode, low delays are possible among all users. Overall, the location of the conference bridge and the user’s locations are important. It is recommended to manage the conference bridge close to the maximum available users to reduce delay. The configuration indicated in Fig. 20.5(b) reduces delay, consumes less bandwidth on the IP backbone, and runs between inter-regions. The diagrams are shown for three users. These diagrams can be extended for more users. The users A and B are shown connected to the same gateway, but they can be connected on different VoIP gateways or VoIP user terminals that are located in the same region.

Figure 20.5. Three-way conference example. (a) Mixing at farthest point. (b) Mixing at a point close to maximum number of users.
20.3.2

Packet Flow Impediments in the VoIP System

Jitter buffers will put forth their best effort in playing the packets arrived at their input. The VoIP system (gateway) has to ensure proper packet flow between voice payloads and VoIP system physical interfaces. The physical interfaces are Ethernet, wireless local area network (WLAN), universal serial bus (USB), digital subscriber line (DSL), and so on. In general, users are getting VoIP systems with more processing power and IP network bandwidth. In practice, the applications and interfaces that demand processing and bandwidth are also growing with time. In many implementations, voice and fax chain modules will have dedicated resources. The processing of voice and fax packets is combined with several other applications on the host network processor. On host processing, some applications may block processing to several voice frame durations. This blocking will occur as random events developing IP impediments of packet bursts or packets drop. With several user applications and interfaces working together, it is essential to control voice packet delays inside the VoIP system. In simple terms, the packets from a physical interface to the adaptive jitter buffer (AJB) input must be guaranteed a minimum fixed delay of a few milliseconds in the receive path. The same analogy is applicable to the send path. The payloads created by voice processing should reach physical interfaces without any packet impediments of jitter and drop. A fixed delay in both directions will always occur. Fixed delays are less harmful than the impediments.
20.3.3

AJB with Utilization of Silence Zones

On the network, voice packets encounter variable transit delays caused by variable queue lengths in routers or congestion in traffic. AJB removes the jitter in the arrival times of the packets and delivers packets synchronized with the voice processing algorithms and the PCM interface clock. Jitter buffer algorithms keep the buffering delay as short as possible while optimally using all available packets delivered from the network. Adaptive jitter buffers are also made to cater to various codecs, packet sizes, as well as voice, fax, and fax pass-through modes. Many jitter buffer designs may be lacking the knowledge of speech and silence zones. To adjust the packets, jitter buffers either drop a packet or create a silence on an as-needed basis. This type of packet adjustment may occur in the middle of speech causing the degradation of voice quality.
By implementing detection of silence zones, matching the way PESQ and MOS measuring algorithms identify utterances can help in preserving or improving voice quality. The PESQ algorithm identifies utterances and any adjustments of speech in the middle of an utterance that appear as lower PESQ-MOS. Silence adjustments after end-of-valid utterances will not degrade the PESQ measure. By imposing these conditions, AJB implementation can preserve voice quality and cause PESQ-MOS to be the same even during AJB
adjustments, and this type of operation also preserves voice quality in clock drift conditions.
20.3.4

Packet Loss Concealment

End- t o- end packet impediments in the VoIP system degrade the perceived voice quality. After ensuring the possible best effort in packet delivery, and jitter buffer operations, the amount of improvement to packet impediments depends on the robustness of the speech codec’s PLC algorithm. PLC is given in topic 5. Most low-bit-rate codecs have built-in packet loss concealment. The improvements will be fixed with most low-bit-rate codecs. For waveform-based codecs like G.711 and G.726, several PLC options are available. For low packet drop up to 5%, most higher end techniques perform to the same level of improvement. With higher packet drop deployments, a user will go for proprietary decoder-based schemes. Transmitter-receiver-based techniques can perform better for higher drops, but once again, the problem may get exaggerated because of the higher demand of bandwidth with transmitter-receiverd-based techniques. ITU-G.711 PLC given in topic 5 is found to be meeting the same performance as linear prediction (LP) and hybrid PLC techniques up to a 5% drop. In recent deployments, several service providers have been targeting a lower than 1% drop. This level is in agreement with the TIA/EIA-116A (2006) recommendations. Thus, the ITU-T G.711 PLC algorithms are performing to the required level without calling for proprietary implementations. For voice quality considerations, it is recommended to incorporate a good PLC scheme, but the effort should be in ensuring low end-to-end delay with no or minimal packet impediments.
20.3.5

Echo Cancellation

The details on Echo cancellation were discussed in topic 6. In this section, line echo cancellation in relation to the voice quality is discussed. Line echo is always present with analog telephones. In local PSTN-based systems, echo also is present. It is not perceived in the PSTN because of low end -to -end delays, losses in lines, and programmed padding losses at digital loop carriers (DLCs). In VoIP systems, end-to-end losses are also programmed, but end-to-end delays will be more than the PSTN and will vary with time. Increased delay calls for mandatory incorporation of an echo canceller in VoIP systems. Adaptive echo cancellers remove echo in two stages, as follows:
1. Linear part (also called ERLE, typical echo residue removal of 30 to 35 dB)
2. Nonlinear part to create extra required loss (typical removal of 12 to 24 dB)
When end – to – end delays are comparable with 50 ms, the first stage will be sufficient to remove the linear part of the echo. This arrangement of linear
part cancellation will create better voice quality. The nonlinear stage helps in catering to some more rejection for higher end-to-end delays. G.131 provides the requirements on TELR as a function of the mean one-way transmission as given in topic 6 . It shows that echo becomes more perceivable as delay increases. As an example, a one-way delay of 50 ms demands 40-dB total TELR rejections. A phone contributes to 10 dB loss, and echo canceller, send loudness rating (SLR), receive loudness rating (RLR), and padding losses in signal path have to take care of the remaining 30 dB. Most echo cancellers cancel up to 30dB with linear part cancellation meeting the requirements for the linear part of the cancellation.
An echo canceller also takes care of many other conditions of comfortable background creation during echo residue shaping, double-talk protection, modem/fax tone detection, convergence monitoring, and generation of required parameters for voice quality monitoring. Many control plane operations along with nonlinear echo cancellation influence echo perceptions. Hence, reducing delay is one of the main goals to help with echo cancellation.
20.3.6

Voice Compression codecs

In the PSTN, voice goes through a cascade of G.711 (^/A-law) codecs. On international calls, the number of cascade operations may increase. In voice communication between two VoIP gateways, a G.711 voice will go end-to-end causing the number of quantization units to be lower.
Narrowband Coding. VoIP service providers are migrating back to G.711-based solutions because of the availability of more Internet bandwidth. This migration allows VoIP voice quality to approach to PSTN quality. For better interoperability, and to cater to wider deployments, several compression codecs like G.729AB, G.723.1A, G.726, and the GIPS family of codecs will also be supported in VoIP systems.
Wideband Coding. Wideband voice is one of the main approaches in providing voice quality, which is better than the existing narrowband PTSN. PSTN and narrowband VoIP services use 300 to 3400 Hz. Wideband compression uses frequencies from 50 to 7000 Hz. VoIP provides a better experience of voice quality by going for wideband voice. By expanding the limit of bandwidth, significant improvement in intelligibility and quality is achieved. Often fricative sounds such as “s” and “f,” which are very hard to distinguish in telephony band situations, sound very natural in wideband speech. Low-delay, wideband speech creates a natural conversation experience to the users.
While writing this topic, many VoIP systems in the market were interop-erating with the G.722 wideband codec. Depending on the deployment requirements, many other G.722 and G.729 wideband codecs may be supported. VoIP signaling and RTP supports the wideband codec family. Wideband- capable telephones are limited in availability at present. It is assumed that wideband
acoustic interfaces will be available because of the increasing demand for improved VoIP voice beyond toll quality.
20.3.7

Transcoding and Conference Operation with Codecs

On international (inter-regional) PSTN calls, the number of G.711 cascades could reach four to seven [ITU-T-G.173 (1993)]. In voice communication between two VoIP systems, G.711 voice will move directly between two end systems causing the number of quantization units to be lower. While making calls from the PSTN to mobile phones, it may go through multiple cascaded stages of a global system for mobile communications (GSM) codecs and G.711. In the case of VoIP, voice packets can be sent directly using cell- phones-compatible codecs such as the adaptive multirate (AMR) codecs. This process can reduce the number of quantization stages even for international calls and to non-VoIP systems. It will help in improving overall voice quality. During a conference call, combining occurs on linear samples. Hence, additional transcoding on combined output is mandatory at the conference bridge. In situations in which some users operate in low-compression like G.729AB and other users in G.711, the voice quality degrades because of multiple transcoding operations. It is recommended to use conference mixing at a central location, using low-compression codecs like G.711. The location of a conference bridge and the delays involved are illustrated in Fig. 20.5(a) and 20.5(b). In summary, a VoIP call made as end-to-end with minimal codecs involved in the call will give better quality.
The effect of transcoding or tandeming of speech codecs in a VoIP end-to- end system has an impact on speech quality. The equivalent equipment impairment factor Ie – eff of the entire system changes according to the type of codecs used in tandeming. The Ie factor increases on an additive scale that is equal to the cumulative sum of the equipment impairment factor values of individual codecs under no packet loss conditions [ITU-T-G.113 (2003)]. For example, tandeming of G.726 (Ie = 7) with G.729 (Ie = 10) results in effective Ie – eff of 17; similarly, tandeming of G.729 (Ie = 10) with G.723 (Ie = 15) results in Ie of 25. More details on effective Ie – eff values for codecs under tandeming conditions can be found in P.833 [ITU-T-P.833 (2001)].
The effective impairment factor values in tandeming mode for wideband codecs also change on the additive scale. For the wideband mode, the equipment impairment factor Ie ,wb of the codecs should be considered. The value
of R0 should be taken as 129 [ITU- T- G.107 (2006)] . The tandeming of two
G.722 (Ie ,wb = 13 from topic 3) codecs results in an effective Ie ,wb of 26. Hence, when are in tandem operation in the VoIP system, the effective impairment factor increases to cause reduction in the overall R-factor and voice quality.
Cordless Phones and Advantage Factor. A cordless phone has an analog interface with the base station. The base station communicates with the cord-
less handset using ADPCM, which is also known as G.726 compression. As explained in topic 3, G.726 supports multiple rates. In cordless-handset-to-base-station communication, the G.726 codec at 32 kbps is used. As per the E-model, mobile phones will give an advantage factor of 5. The cordless phone G.726 has an equipment impairment factor Ie of “7″ for 32kbps [ITU-T-G.113 (2007)]. As a first-level approximation, the R-factor reduces by 2(5 -7 = -2) instead of adding “5″ as an advantage factor. The aspects of cordless phones and Ie degradations are not seriously considered in the literature. In general, cordless phones do reduce voice quality, and this reduction in quality is approximately compensated with advantage factor “A” in voice quality estimation. It will also work like a tandem operation of the G.726 codec with other codecs used in the VoIP system. In practice, cordless phone G.726 coding can limit voice quality.
20.3.8

Codecs and Congestion

While arriving at trade-offs on packet loss and quality, waveform-based lower compression codecs like G.711 can give better quality than low-bit-rate codecs like G.729AB. In the situations of higher packet loss, forward error correction (FEC)/redundancy techniques are more helpful than decoder-based PLC. Redundancy demands higher bandwidth. In bandwidth-limited systems, redundancy drops packets even more. In such situations, it is worth renegotiating for low-bit-rate codecs instead of continuing in G.711. Codec G.729AB with FEC/redundancy may take lower bandwidth than G.711. Bandwidth requirements for different codecs are given in topic 11.
20.3.9

Country-Specific Deviations

The VoIP adapter is like a PSTN central office (CO) to the phone. The VoIP adapter has to emulate the characteristics of the PSTN CO. The front- end SLIC of the VoIP gateway is programmed for required voltages, line conditions ringer equivalence number (REN) drive, impedances, gain/losses, and diagnostic features. The right combination of phones, central offices, and transmission lines can provide better quality. Country-specific deviations and catering to the requirements of multiple countries are given in topic 17.
20.3.10

Signal Transmission Characteristics

In VoIP adapters, a foreign exchange subscriber (FXS) interface is used for connecting the telephone. The interface has to meet certain switching and transmission characteristics for better voice quality. Various specifications of transmission requirements are listed in the TR-57 document for North America. Similar requirements will be available in local PSTN standards. Even though the TR-57 standard does not talk about VoIP, meeting its specifications for the FXS interface is expected to result in better speech quality under better
network conditions matching closely with the PSTN. TR-57 measurements are for narrowband voice and are classified into two categories, namely signaling or switching characteristics and transmission characteristics.
In a VoIP system, delays encountered in establishing a telephone call are more than that in a PSTN-based system. These delays vary widely based on the distance, network conditions, and the supported VoIP infrastructure. Some signaling events may take much longer than PTSN-based call signaling. Many signaling and call establishing timings in VoIP systems are more delayed than TR-57 switching requirements. These signaling delays may not result as direct degradations on voice quality, but improving on these can create better natural interactions matching closely with the PSTN service.
The documents IEEE STD-743 (1995) and TIA/EIA-470C (2003) provide additional specifications and tests to take care of wideband characterization. It is expected that while perfecting for wideband voice, several new specifications will be added for end-to-end wideband voice quality.
20.3.11

Transmission Loss Planning

The goal of establishing a loss plan for voice communication systems comes from the desire to have the received speech loudness at a comfortable listening level. The received loudness will depend on the speech level of the talker, transmit and receive efficiencies of the voice terminals, as well as loss in the system and intervening network. It is generally accepted that a connection with an overall loudness rating of 10 dB will provide a high degree of satisfaction for most of users. Some more details on the losses and loudness rating are given in topic 6.
20.3.12

SLIC-CODEC Interface Configurations

SLIC-CODEC devices communicate samples to the processor on a PCM interface. This interface operates at 8-kHz frame synchronization and accommodates slots based on the requirements. Lower channel systems use a 24- or 32-channel PCM interface. The PCM interface will use A-law, |>law, or 16-bit linear samples for communication between SLIC-CODEC devices and the processor. Using 16-bit samples minimizes the quantization distortion and helps in getting a better linear part of echo cancellation. In wideband, linear samples support is used as a requirement to get better voice quality.
20.3.13

DTMF Rejection as Annoyance

Various operations of DTMF, tone generation, and tone detection are discussed in topic 7. PSTN users are clearly aware that dialed digits are audible. In VoIP because of in-band and out of band combinations, users are annoyed with dialed digits on an established voice call. DTMF rejection and issues are discussed in topic 7. The residues and DTMF ticks create unde-
sired disturbances. It is also essential to minimize false digits. A false digit can create wrong operations in a call and disturb in-band voice.
20.3.14

QoS Considerations

Quality of service is fundamental to VoIP network management, branding the voice quality performance at end-devices and ensuring the expected service quality. These considerations are identified with layer-3 and layer-2 techniques. The bottom-line action for upstream QoS is to deliver every voice/fax packet without dropping and with a minimum delay and jitter of a few milliseconds (with a worst case of 5 to 10ms). The techniques for this QoS will vary with the total system, interfaces, and the provisioned applications. Some issues in the systems will include media packets staying in the packets queue for a long time and encountering varying queue delays (jitter).
Downstream is another most troubling aspect of VoIP systems. From the Internet service provider, the supported downstream bandwidth is usually much higher than the upstream. Hence, many designers ignore upstream voice packet issues. The demand of data bandwidth on the downstream direction is also much higher from users. Several download operations, data, video, and other media will occur simultaneously in downstream. Several members in residence may share the same available bandwidth. In practice, the high-t hroughput downstream bandwidth is also drained to the congestion point. During congestion, data will be recovered from a Transmission Control Protocol (TCP) based operation, but User Datagram Protocol (UDP) based voice and fax will encounter drops. In downstream, packet impediments will mainly occur at the VoIP or Internet interfaces like DSL and Ethernet.
Several CO devices of Internet service providers (ISPs) may have upstream QoS. Upstream QoS at the ISP CO will work similarly to the downstream QoS of the end- user system. In the absence of any QoS mechanisms from the service provider, it is essential to control the downstream bandwidth requirements from the end-user router and VoIP system. Many statistics from RTCP and RTCP-XR parameters can indicate a packet drop. The reported statistics can be used to control low- priority applications that demands downstream data. In summary, downstream QoS is not a direct approach like upstream QoS. It is essential to incorporate some proprietary techniques that ensure no packet impediments occur from the growing bandwidth requirements of the end user. With increasing awareness of using parameters from RTCP-XR packet details, several simple techniques will be adapted to help in controlling the media packet flow for reducing packet impediments.
20.3.15

GR-909 Telephone Interface Diagnostics

For several decades, PSTN service providers perfected many self-diagnostic and terminal interface monitoring tests through electronic central offices and DLCs. These tests require a telephone service provider to check remotely the
first-level health of the telephone interface without going to the customer premises. The important diagnostics specific to telephone interface are as follows:
• Line voltage, DC voltages, and battery voltages
• Receiver off-hook/on-hook
• Ring and ringer equivalence
• Resistive faults
• Loop – back tests
Many of these features are given in GR-909 and SLIC-CODEC manufacturer datasheets. Some proprietary monitoring features have also been added to suit customer requirements. In VoIP, with the incorporation of the PSTN type of diagnostics, it is possible to diagnose the problems remotely through network interfaces and to provide possible updates or solutions. Many service providers are incorporating their own management interface to conduct GR-909 tests on VoIP solutions. Incorporating GR-909 tests may not improve voice quality, but it helps in monitoring the interfaces for quality. In the next sections, voice quality monitoring and RTCP-XR packets are discussed to help in monitoring the voice quality based on signal characteristics and packet flow. Several VoIP software and hardware solutions provide these operations [URL (PIQUA)].
20.3.16

Miscellaneous Aspects of Voice Quality

Some other important items that contribute to the voice quality are given as summary points below:
• Several phone call impediments are given in the manual calls of VoIP voice testing in topic 13.
• Ensuring clock precision and making systems perform under wide environmental conditions is essential. Laboratory tests on crystals and clock oscillators may show very less PPM. It is required to ensure better control on PPM under practically possible worst-case conditions.
• PSTN telephone interfaces cater up to required REN and a maximum up to 5 REN. PSTN systems also drive long distance from the DLC. VoIP system interfaces are selected for lower REN to save on the cost of building the customer premises systems, which can degrade voice quality with several parallel phones. This degradation has to be considered while building the total system.
• Several switching and call feature tones have to match closely with local PSTN systems. This matching will improve call interactions and continue to create a similar experience as with PSTN calls.
• Voice clippings have to be contained to less than 0.2% to 0.5% of the active speech [ITU-T-G.116 (1999)]. These clippings can happen with VAD/CNG and in transition between speech and silence.
• Systems that can support for future upgrades of wideband voice and other applications.