E-MODEL-BASED VOICE QUALITY ESTIMATION (VoIP)

20.2
The E-model is the equipment impairment model described in G.107 [ITU-T-G.107 (2005)]. This model examines signal and packet transmission characteristics to predict voice quality on a linear scale. The objective of the E-model is to determine a transmission quality rating (R) (i.e., the R-factor or R-value that incorporates the “mouth to ear” characteristics of an end-to-end speech path). In the usage, it is also common practice to reference the E-model as the R-model. This model helps in analyzing and identifying the root causes of voice quality degradation. The R-factor is mapped to subjective MOS and to many other voice quality parameters. The typical useful range for the R-factor is 50 to 94 for narrowband telephony. An R-value below 50 is not suitable for continuing call conversation. Voice quality monitoring (VQmon) [URL (Telchemy)] and RTCP-XR packets [Friedman et al. (2003)] in VoIP makes use of E-model parameters. It is widely used for VoIP service quality measurements.
An E-model-based voice quality estimate for multiple voice compression codecs is given in Section 3.7. More details on this topic are available at [ITU-
T- G.107 (2005) , ITU - T- G.113 (2007) , ITU - T- G.108 (1999) , ITU - T- G.175
(2000)]. In this section, an overview and some more extensions from the published literature and recommendations are included. The E-model makes use of several parameters broadly classified under delay, delay variations, echo, noise, and phone characteristics as well as signal transmission, codec, and packet characteristics. In narrowband telephony, the R-factor ranges from 0 to 100, with 100 being the MOS equivalent of 4.5 that is achieved only with direct linear 16-bit samples. Voice compression reduces the R-factor. In an end-to-end digital service such as ISDN with G.711, an R-factor of 93.2 is possible as the highest value. In VoIP service, several impediments contribute to the degradation of the R-factor. In the first version of G.107 recommendation, the R-factor under ideal network conditions was considered to be 94.2. This
value was revised to 93.2 in the later versions of G.107 [ITU-T-G.107 (2005)].
From the R-factor, additional parameters can be derived such as MOS, minimum percentage of people able to say Good or Better (GoB), maximum
percentage of people that report as Poor or Worse (PoW) quality, and so on. The R-factor is mapped to a subjective voice quality measure MOS using the following equations:

Table 20.2 lists the R-value ranging from 50 to 100 in selected steps. The corresponding MOS, GoB, PoW, and qualitative user satisfaction limits are given. It can be observed that R from 90 to 100 corresponds [ITU-T-G.109 (1999)] to best quality, 80 to 89 is high quality, 70 to 79 is medium, 60 to 69 is low, and 50 to 59 is poor. A rating below 50 indicates unacceptable quality.
In general, deviations will occor between the E-model based MOS and the PESQ-based MOS. The R-factor-based MOS is slightly higher than the PESQ-

Table 20.2. Relation Between R-Value, Corresponding MOS and User Satisfaction for Selected R-Factor Values

		Good or Better	Poor or Worse
R-Value Lower		(GoB % Lower	(PoW % Upper
Limit	MOS	Limit)	Limit)	User Satisfaction
100	4.5	99	0	More than PSTN,
				comes only with 16-
				bit linear samples
94	4.42	98	0	Very satisfied (R is 90
92	4.38	98	0	to 100).
90	4.34	97	0	PSTN quality; achieved
				in VoIP G.711 as
				best case
88	4.29	96	0	Satisfied
86	4.23	95	1	(R is 80 to 89)
84	4.17	93	1
82	4.10	92	1
80	4.02	90	1
79	3.99	88	2	Some users dissatisfied
78	3.95	87	2	(R is 70 to 79)
76	3.86	84	3
74	3.78	81	3
72	3.69	77	5
70	3.60	73	6
69	3.55	71	7	Many users dissatisfied
65	3.35	62	11	(R is 60 to 69)
60	3.10	50	17
55	2.84	38	27	Nearly all users
50	2.58	27	38	dissatisfied
				(R is 50 to 59)

based MOS for similar impairments. The R-factor-based MOS is represented as listening quality (LQ) and conversational quality (CQ). The MOS score obtained by considering only coding distortions and packet losses is called MOS-LQ. The MOS score obtained by considering delay and loudness impairments in addition to the distortion impairments is called MOS- CQ. These MOS R-factor values are used in RTCP-XR reports [Friedman et al. (2003)].
20.2.1

R-Factor Calculations

The R-factor is a transmission-rating factor for the quality of the voice in VoIP. The R-factor is a scalar prediction that ranges from 0 to 100 for narrowband voice communication. The end equipment used, room noise, losses in the network, delay, packet loss, and compression algorithms used affect the Rvalue. The value of the R-factor can be computed by the following equation:

R0 is the highest value of R that takes into account mainly the signal-to-noise ratio (SNR) value, including noise sources such as circuit noise and room noise and subscriber line noise.
R0 is a function of (Nc, SLR, Ps, Ds, RLR, Pr, LSTR)
Is comprises impairments that occur simultaneously with the voice signal. The major factors that contribute to this impairment are loudness ratings of the telephone set, number of quantization distortion units, and side-tone loudness rating.
Is is a function of (R0,SLR,RLR,STMR,TELR,qdu
Id comprises impairments caused by delay. The factors that contribute to these impairments are the amount of delay present in the network and the values of talker and listener echo loudness ratings.
Id is a function of (T, Tr, Ta, RLR, STMR, TELR, WEPL)
Ie – eff is the equipment impairment factor, which mainly comprises impairments caused by distortion. The main parameters that contribute to the Ie – eff are the voice compression codec and end-to-end the packet impediments.

“A” is the advantage factor, which represents the user tolerance to the degradation of the voice quality. The value of “A” is governed by the end-user communication interface. For wire-bound communication, the value of “A” is zero

Table 20.3. ITU-T Recommendation G.107—Default Values and Permitted Ranges for the Parameters for Use with the E-model R-Factor Calculation

Parameter		Default		Allowed
Symbol	Parameter Name	Value	Units	Range
SLR	Send loudness rating	+8	dB	0 to 18
RLR	Receive loudness rating	+2	dB	-5 to 14
STMR	Side tone masking rating	15	dB	10 to 20
LSTR	Listener side tone rating	18	dB	13 to 23
Ds	D – value of phone at send side	3	—	-3 to 3
Dr	D – value of phone at receive side	3	—	-3 to 3
TELR	Talker echo loudness rating	65	dB	5 to 65
WEPL	Weighted echo path loss	110	dB	5 to 110
T	Mean one – way delay of echo path	0	ms	0 to 500
Tr	Round – trip delay	0	ms	0 to 1000
T	Absolute delay in echo – free	0	ms	0 to 500
	connection
qdu	Number of quantization	1	—	1 to 14
	distortion units
L	Equipment impairment factor	0	—	0 to 40
Bpi	Packet loss robustness factor	1	—	1 to 40
Ppi	Random packet loss probability	0	percentage	0 to 20
Burst ratio	Burst ratio	1	—	1 to 2
Nc	Circuit noise referred to 0-dBr	-70	dBm0p	-80 to -40
	point
Nfor	Noise floor at receive side	-64	dBmp	-64
Ps	Room noise at send side	35	dB(A)	35 to 85
Pr	Room noise at receive side	35	dB(A)	35 to 85
A	Advantage factor	0	—	0 to 20

and it is “5″ for mobility. The value of “A” only nullifies the contributions from Ie-eff Id, and Is, but it cannot increase the R0 value.
The detailed steps of R-factor calculations and parameters are given in G.107. A summary on parameters used and their range of values with defaults [ITU-T-G.107 (2005), Britt (2007)] are given in Table 20.3 . Several details on decibel units used in Table 20.3 are given in references [ITU-T-G.100.1 (2001), TIA/EIA-116 (2001)].
R0 – Is value is 93.2 as the default value specified for room noise conditions and signal levels. This value is valid in good implementations of hardware and front ends meeting TR-57 or local PSTN transmission characteristics.
Delay Impairments—Id. The delay impairment factor deals with impairments that are caused by loss of interactivity between the communicating ends. This loss is from long values of absolute delay and perceivable echo. The Id is calculated from delay, talker, and listener echo. The talker echo loudness rating
(TELR) is the loudness loss of the speaker’s voice reaching the ear as a delayed voice. As the delay increases, the loudness loss should also increase as per G.131 [ITU-T-G.131 (2003)]. Assuming a good echo cancellation (i.e., TELR > 65 dB), the delay impairment factor can be modeled as a function of one-way absolute delay “Ta” in milliseconds. Ta is the cumulative sum of all one-way delay components involved in a voice call. Increase in absolute delay causes users to feel that interactivity in communication is lost. In relation to G.108 [ITU-T-G.108 (1999)], assuming that all other impairments are in control, the high value of one-way absolute delay causes an increase in Id value. A one-way delay of more than 150ms causes more degradation to voice quality [ITU-T-G.108 (1999)]. Based on the graph given in G.108, the formulation [Sun and Ifeachor (2003)] for Id based on one-way delay impairments is expressed as

The delay contribution dominates after 177.3 ms when the TELR value is >65dB. In G.131 recommendation [ITU-T-G.131 (2003)] figures for a TELR of 65 dB, the delay dominance is shown as starting from 150ms through R-factor degradation. This value of 150 ms is very close to 177.3 ms.
Equipment Impairment Factor—Ie.em The equipment impairment factor Ie – eff is computed by considering the voice compression codec used and the
end – to – end packet losses. According to ITU – T- G.107 [ITU - T- G.107 (2005) ],
the effective impairment factor for narrowband codecs is calculated from the following equation for random packet losses:

In the above equation, Ie is the default equipment impairment value of the codec used [ITU-T-G.113 (2003)] under zero packet loss conditions. The parameter Ppl is the random packet loss rate. The random packet loss rate implies that the probability of loss of a packet is independent of the probability of loss of any other packet. The Bpl is the packet loss robustness factor assigned to each codec. This value reflects the ability to recover the packet loss using a method of packet loss concealment.
Equipment impairment values for different packet loss conditions were formerly given as codec – dependent tabulated values [ITU - T- G.113 (2003) ]. The calculations using parameters Ie , Ppl, and Bp l along with Eq. (20.1) also provide similar results. With Eq. (20.1), the impairments can be calculated for the required conditions. Under random packet loss, the values of Ie and Bpl for different narrowband codecs used in VoIP are given below [ITU-T-G.113
(2001) , ITU - T- G.113 (2007) ].
• Codec G.711 of 10-ms packetization, Ie = 0 for zero percentage packet drop with Bpl value of 25.1.
• Codec G.711 of 10-ms packetization, Ie = 0 for without PLC and Bpl value of 4.3.
• Codec G.729AB of 20-ms packetization, Ie = 11 for zero percentage packet drop with Bpl value of 19.
• Codec G.723.1A of 30-ms packetization, Ie = 15 for zero percentage packet drop with Bpl value of 16.1.
Table 20.4 is given for Ie – eff, R, and listening MOS for two popular codecs G.711 and G.729AB with voice activity detection (VAD). The values are calculated based on Eq. (20.1). From the table, it can be observed that voice compression creates major degradation. Packet losses degrade voice quality even more compared with other impairments like delay, echo, and signal transmission.
In practical deployments, packet drops are not just random and can follow bursty losses. Bursty losses are dependent on previous packets statistics, and random loss is an independent loss. A bursty loss is a high packet loss in a short time interval. During burst, a high proportion of packets are either lost or discarded because of late arrival. A bursty drop creates more degradation compared with random loss. A bursty equipment impairment factor is given as [ITU-T-G.107 (2005)].

Table 20.4. Random Packet Loss percentage and corresponding MOS for G.711 (10 ms packetization) and G.729AB (20 ms packetization) codecs (PESQ-LQ readings in this table are mean values over several measurements taken using DSLAII on gateways with an FXS interface)

				G.711				G.729AB
	G.711 with ITU-			Measurements				Measurements
Packet Drop %	Based PLC Ie = 0, Bp! = 25.1			on VoIP Gateway	G.729AB, Ie = 11, Bpi = 19			on VoIP Gateway
Packet Drop %	le-eff	R	MOS-LQ	PESQ-LQ Mean	Ie-eff	R	MOS-LQ	PESQ – LQ Mean
0	0	93.2	4.41	4.32	11.0	82.2	4.10	3.77
1	3.6	89.6	4.33	4.08	15.2	78.0	3.95	3.62
2	7.0	86.2	4.24	3.83	19.0	74.2	3.79	3.29
3	10.1	83.1	4.14	3.65	22.5	70.7	3.63	3.16
4	13.1	80.1	4.03	3.43	25.6	67.6	3.48	2.96
5	15.8	77.4	3.92	3.49	28.5	64.7	3.34	2.95
6	18.3	74.9	3.82	3.17	31.2	62.0	3.20	2.58
7	20.7	72.5	3.71	3.22	33.6	59.6	3.08	2.60
8	23.0	70.2	3.61	2.96	35.9	57.3	2.96	2.56
9	25.1	68.1	3.51	2.94	38.0	55.2	2.85	2.38
10	27.1	66.1	3.41	2.76	40.0	53.2	2.74	2.17

The BurstR parameter is used to capture the burst conditions of the packet loss distribution. The values of BurstR are defined as [ITU-T delayed D.20 (2001), Alexander (2006)].

The BurstR is calculated based on consecutive packet losses relative to random loss [ITU-T delayed D.20 (2001)]. For random packet loss, BurstR = 1, and for bursty packet loss, BurstR is >1, Burst R < 1 corresponds to scatter losses present in the network. BurstR is helpful in measuring short-term burstiness of packet loss.
20.2.2

Bursty Packet Losses

In the G.113 Amendment [ITU-T-G.113 (2007)] under a bursty loss condition, a set of provisional planning values were specified for the G.711 codec with 20-ms packetization, and this assumes that PLC employed is repetition/silence. It is also assumed that packet loss percentage Ppl is < 2%. The Bpl value proposed is reduced to “4.8″ from the random case of “25.1,” and BurstR is set to “1.” This assumption is for a specific sample of burst packet loss and may not reflect the impairment caused by burst packet loss in general. As per the
G.113 Amendment – 2 [ITU - T- G.113 (2007) ] , it is recommended that under
bursty packet loss, the BurstR approach of the E-model Eq. (20.2) [ITU-T-G.107 (2005)] should be employed only for codecs with an efficient codec-state -based PLC with a packet loss robustness factor Bpl > 16. The BurstR approach is based on a two-state Markovian packet loss model, and it captures only short – term consecutive loss dependencies [ITU - T- G.107 (2005) ]. The BurstR approach and provisional planning values for Ie – eff given in G.113 [ITU-T- G.113 ( 2007)] are valid under some constraints. The computation of the equipment impairment factor from the BurstR approach is not yet concluded fully in recommendations and standards. Based on the published literature, overview on packet loss models are presented below.
Packet Loss Models. The mechanisms that lead to packet loss are generally transient in nature. Hence, the packet loss distribution can be shown by packet loss model rather than by a simple set of packet loss count. Packet losses are mainly modeled as random and burst. The random loss in the network can be modeled by the Bernoulli model. Using this loss model, the probability “P-” of a packet loss is estimated by counting the number of lost packets and by dividing it by the total number of packets transmitted.
Two-State Models. During bursty conditions, the BurstR approach of the Ie calculation [ITU-T-G.107 (2005)] is based on a two-state Markovian packet loss model. In this model, a transition between a “good” state-0 and a “bad” state – 1 is used for state transition probabilities [ITU-T-G.1020 (2006)] . The probability “p” denotes a packet dropped given that the previous packet was received. The probability “q” is the probability that a packet will be received given that the previous packet was dropped. “1 – p” is the probability that a packet is received given that previous packet is also received. “1 – q” is the probability that a packet is lost given that a previous packet is also lost. The parameter BurstR can be derived from a two- state Markov model as [Alexander (2006)].

The other popular two-state models are Gilbert and Gilbert-Elliott. In a two-state Markov model, state “1″ indicates a loss state. In the Gilbert model, the loss state will have an independent loss probability associated with it. In the Gilbert-Elliot model, state “0″ will also have an independent loss probability associated with it [Alexander (2006)]. These two-state packet loss models using transient probabilities are used in analysis of packet loss burstiness in terms of consecutive loss. Hence, these packet loss models can capture short-term dependencies between packet losses. These models miss the effects of longer periods of high loss density. Markov-4 model helps with detailed analysis of the packets.
Markov Four-State Model. A Markov four-state model can be used to capture the very short duration consecutive loss events and the longer events of lower loss density. This model divides packets as part of bursts and gaps, which typically represent phases of higher and lower packet loss. The Markov-4 model is also treated as a combination of two, two-state Markov models. As shown in Fig. 20.3(b), these states are identified with gap and burst boundaries. Each two-state Markov model has total four independent transition probabilities. Two more transition probabilities represent a transition between bursts and gaps. The combination of all these states and probabilities provides better analysis with the Markov-4 model.
The states in gap and burst are assumed based on the number of consecutively received packets between two packet loss events. The characterization of burst and gap is decided based on the consecutive received packets of size Gmin. The value of Gmin is 16 [ITU-T-G.1020 (2006)]) for voice applications. A sequence begins and ends with a loss during which it the number of consecutive received packets is less than Gmin; then the two lost packets and sequence are assumed to be part of the burst. The periods between the bursts, where numbers of consecutively received packets are more than Gmin are regarded as gap. The Markov four-state model assumes that a packet can be

Figure 20.3. Packet loss models. (a) Two-state model. (b) Four-state Markov model.
in one of the four states. In the process of state change, state transitions are created, as follows:
State-1—packet received successfully in a gap State-2—packet received within a burst State-3—packet lost within a burst State-4—isolated packet loss within a gap
State-1 and state-4 are part of the gap. Gap is the indication of good quality and is desired. State-2 and state-3 are part of the burst. Burst is of more losses and bad quality. At each packet loss event, the number of received (good) packets count from the last packet loss event is noted. This value is used to categorize whether the sequence of received packets and the two lost packets are part of gap or part of burst. A set of counters is used to track the number of packet counts present in each state as well as the key transition counts. These counters are updated at each packet loss event during the course of a call. After the call is completed, the remaining transition counts are derived and then normalized to provide the desired independent probabilities. The probabilities p11, p14, p41, p13, p22, pa, p33, p32, and p31 marked in Fig. 20.3(b) represent probabilities from one state to another. Complete formulation is available in [ETSI TS 101 329-5 (2000)].
This statistical analysis approach in terms of individual probabilities for a state transition provides the following packet-related statistics [Clark (2001), ETSI TS 101 329-5 (2000)]:
Gap length is the average length in ms of the gaps detected for the stream.
Gap density is the average packet loss percentage in a gap. Burst length is the average length (ms) of the bursts detected. Burst density is the average packet loss percentage in a burst.
A Markov-4 model is a combination of two, two-state models of independent probabilities that correspond to gap and burst. The effective equipment impairment factors under the burst conditions as well as the gap conditions are obtained by using loss densities obtained in each state respectively. The equipment impairment factor Ie g for the gap condition is obtained by substituting gap loss density in the place of packet loss percentage (Ppl) in Eq. (20.1). Similarly for the burst condition, the impairment factor Ieb is calculated using the burst loss density in place of Ppl in Eq. (20.1). The obtained values of Ieg and Ieb represent the effective impairments under the gap and burst conditions, respectively. As bursts represent periods of high packet loss, the value of Ie b will be high and the R-factor in burst is low.
Note on VAD Packets. For determining a lost or discarded packet near the start or end of an Real-Time Transport Protocol (RTP) session, it is assumed that the RTP session is preceded and followed by at least Gmin received packets. Calculation of burst parameters in silence zones is not correct. A packet loss in silence will not degrade voice quality. It is assumed that VAD is being used and, hence, packet loss reports have to be generated based on speech packets. In practice, several implementations may not use VAD/comfort noise generation (CNG) because of the availability of more bandwidth. In this situation, the statistics based on all the available packets may not represent correct voice quality. Any other information from local silence detections may help for proper estimation of packet loss parameters.
Time-Varying Packet Loss Effects. The packet loss in the network is time varying in nature. As the packet loss behavior varies, the packet sequence switches between gaps (good quality) and burst (bad quality) during the call. Hence, the call quality varies between good and bad. If voice quality changes, the listener will be able to report on it after some time. Assume that the quality-level change from burst (Ieb) to gap (Ieg) is indicated by I1 and that the quality-level change from gap (Ieg) to burst (Ieb) is indicated by I2. It is assumed that the equipment impairment factor varies between burst period “b” and gap period “g.” The values of I1 and I2 are expressed as [ETSI TS 101 329-5 (2000)].

where Ieb is effective impairment value under burst condition, Ieg is the effective impairment value under gap condition, “b” is the burst duration and “g” is the gap duration. The values of t1 and t2 were arrived from time-varying assessments, which were subjectively done by a group of users on a 3-minute call in which packet loss was varied from 0% to 25% [Clark (2001)]. In [Clark (2001)], t1 = 5 seconds and t2 = 15 seconds were used. In the recent publication of [Alexander (2006)], the values used are t1 = 9 and t2 = 22 seconds. At this stage, h and s2 values are perception based and are not regularized through recommendations.
The transient nature of the equipment impairment factor during the call [Carvalho et al. (2005)] can be explained as follows. From an exit value I2, the Ie impairment factor exponentially increases to Ieb, with the time constant of t1 in the transition between gap periods to burst periods. From an exit value I1, the Ie impairment factor exponentially decays to Ie g, with a time constant of t2. This transition occurs between a burst period and a gap period. Combining above two equations of I1 and I2 gives an expression of I2 in terms of Ieg and Ieb [ETSI -TS 101 329-5 (2000)].

The average value of the equipment impairment factor is computed by integrating I1 and I2 over burst and gap durations. The average impairment factor Ie(avg) is expressed as

According to [Carvalho et al. (2005)], the values of I2 and I1 are computed at the end of each burst and the value of Ie(avg) for that period of the call is calculated. In this way, during the course of a call at the end of each pair of gap and burst periods, the values of Ie(avg) for the corresponding periods are calculated. The final Ie(avg) will be the weighted average of each instantaneous Ie(avg). The weights are the duration in the call of each Ie(avg) value. This value can be used as an effective (Ie-eff) equipment impairment factor in the R-factor equation to calculate the effects of packet loss and the codec on the transmission quality. Many of these packet loss estimation approaches are base on simulations, perceptions, and the published literature. While writing this topic, these conclusions were not arrived at in the form of a standard for the time-varying equipment impairments and quality.
Recency Effect. The time location of impairment with in a call can affect the listener’s perception of quality. This effect is known as the recency effect. Recency reflects the way that a listener will remember voice quality or the way a listener forgets reduced quality. As part of the statistical analysis of a
voice packets stream, the most recent position of the significant packet loss/ drop influences the recency effect. Significant drop is defined as 8 packets or more packets lost in 16 consecutive packets. The position of significant loss is used to calculate the recency degradation factor. The tendency of the listener is to remember the most recent events. These events can be modeled with a function that typically decays the recollection over a 30-second interval. Let “y” represent the time delay since the last significant burst, and then the equipment impairment factor considering the recency effect can be written as [ETSI TS 101 329-5 (2000)].

The factor “y” can be obtained from the Markov-4 model [ETSI TS 101 329-5 (2000)]. The bursts in the beginning of the voice call are more tolerated.
Toward the end of the call, the same bursts are less tolerated in qualifying the quality. It is important to ensure fewer and short bursts with call progressing time. Burst is always undesirable. Gap is tolerated, and packet loss concealment operations can take care of isolated packet drops of gap.
20.2.3

Improving Voice Quality Based on the E-Model

The following major guidelines have to be followed to improve voice quality based on E-model calculations:
• Making front-end and circuit noise of an end-to-end system to mimic the PSTN system. As an example, the systems can be made to adhere to the TR-57 reference in North America.
• Reducing end-to-end delays to 50 ms and limiting it to less than 100 ms in several deployment situations.
• Echo rejection matching or exceeding the delay-based requirements and perceptions.
• Following country-specific deviations, loss planning, and overall loudness rating.
• Using G.711 with small packetization mainly to help with end -to- end delays.
• Using codecs with smaller ” Ie” if G.711 is not possible to use.
• Avoiding the loss of packets irrespective of codec selection.
• Employing packet loss concealment to help retrieve the voice.