The Design of VoIP Systems with High Perceptual Conversational Quality - Ubiquitous Multimedia Computing

Information Technology Reference

In-Depth Information

equalization. Each element of the training set, therefore, consists of a map-

ping from the system-controlled parameter values and the objective metrics

of the simulated conditions on a pair of conversations to their subjective pref-

erence. This method ensures that the conversations compared only differ by

one parameter value and that their subjective preference can be attributed to

the system-controlled value that leads to that opinion. We then learn a SVM

classifier using training data based on the results of the subjective tests and

the conditions under which the tests are conducted.

At run-time, the parameters representing the current conditions are esti-

mated and input to the SVM. For example, in the design of the POS algorithm

for two-party VoIP, loss, delay, and jitter parameters are used to represent

network conditions, and switching frequency and singe-talk duration param-

eters represent conversational conditions. The SVM learned outputs the

subjective preference for a given pair of points on the operating curve that

corresponds to the network and conversational conditions observed. Its pre-

dictions on the subjective preference between multiple pairs of points on the

same operating curve are combined using the statistical method described

earlier in order to identify the optimal MED value, which is then used by the

POS algorithm to adjust the jitter-buffer delay in order to achieve the operat-

ing point with the highest subjective quality.

2.3 Cross-Layer Speech Codecs for VoIP

Traditional codecs developed for cellular communications and PSTN calls are

not suitable for VoIP because they have been designed for circuit switching

under low bandwidth, fixed bit rates, and random bit errors. These codecs are

not effective in packet-switched networks, whose loss rates and delay jitters

are dynamic. Some recent codecs have been developed for VoIP applications.

They can encode wide-band speech and exploit trade-offs between bit rate and

delay in order to be more robust against bursty losses. However, they have been

designed without due consideration of LC strategies in other layers of the pro-

tocol stack. Without such considerations, the LC strategies in these codecs can

be inadequate and give subpar performance, or redundant and unnecessary.

In this section, we first briefly survey speech codecs designed for VoIP. We

then present the design of cross-layer speech codecs that are done in con-

junction with LC strategies in the packet-stream layer.

2.3.1 Previous Work on Speech Codecs

Speech codecs were traditionally designed for applications in cellular and

PSTN communications. With the proliferation of IP networks, they have

been increasingly used in VoIP. They can be classified based on their coding

Search WWH ::

Custom Search

Home