The Design of VoIP Systems with High Perceptual Conversational Quality - Ubiquitous Multimedia Computing

Information Technology Reference

In-Depth Information

pitch period. However, it requires additional computational resources, has

small effects on MEDs, and is generally perceptible.

At the packet level, there have been several studies that aim to balance

between the number of packets late for playout and the jitter-buffer delay

that packets wait before their scheduled playout times. Open-loop schemes

use heuristics for picking some system-controllable metrics (such as MED),

based on network statistics available [41]. They are less robust because they do

not explicitly optimize a target objective. Moreover, they do not consider the

effects of the codec on speech quality, although their performance depends

on the codec used. Closed-loop schemes with intermediate quality metrics

[39] control an intermediate metric based on the late-loss rate collected in a

window. Their difficulty lies in choosing a good intermediate metric. Closed-

loop schemes with end-to-end quality metrics generally use the E-model [1]

for estimating conversational quality as a function of some objective met-

rics. One study uses this estimate in a closed-loop framework to jointly opti-

mize the POS and FEC-based LC [15]. Another study [13] proposes to use the

E-model but separately trains a regression model for modeling the effects

of the loss rate and the codec on PESQ. These models are limited because,

without a redundancy-based LC scheme, lost frames cannot be recovered by

adjusting the playout delays alone.

Existing VoIP systems usually employ redundancy-based LC algorithms

for recovering losses when using UDP. However, none of these approaches

considers delay-quality trade-offs for delivering VoIP of high perceptual

quality to users. Previous LC algorithms based on analytic loss models

[39,41] do not always perform well, as these models may not fully capture the

dynamic network behavior and do not take into account the LC strategies in

codecs. Existing POS algorithms based on open-loop heuristic functions [41]

may not be robust under all conditions, whereas closed-loop approaches [39]

are difficult to optimize without a good intermediate metric. Some recent

approaches [13,15] have employed an end-to-end objective metric, such as

the E-model, as their intermediate metric. There is also very little reported

results on POS algorithms for multi-party VoIP [60].

We present in the next section new LC and POS control algorithms that

address the trade-offs related to conversational quality. Using the classifier

learned, we use run-time network and conversational conditions to select the

best operating point of these control algorithms. A related problem studied

is the equalization of MSs for improving perceptual quality in multi-party

VoIP. We also consider the design of these algorithms with the design of the

codec and the network-control algorithms in multi-party VoIP.

2.4.2 Packet-Stream Layer LC and POS algorithms

Two-Party LC and PoS Schemes. We have developed new POS/LC schemes

for dynamically selecting a playout schedule for each talk spurt [4] and an

appropriate redundancy degree for each packet. Using the loss information

Ubiquitous Multimedia Computing

Search WWH ::

Custom Search

Home