The Design of VoIP Systems with High Perceptual Conversational Quality - Ubiquitous Multimedia Computing

Information Technology Reference

In-Depth Information

Retransmissions of speech frames after the detection of a network loss is

infeasible in real-time VoIP, due to the excessive delays involved and their

effects on MED.

Nonredundant LC schemes are generally based on the interleaving of

frames during packetization [59]. One way is to exploit the fact that shorter

distortions are less likely to be perceived, and to break an otherwise long seg-

ment into several shorter segments that are close by but not consecutive. This

is not strictly an LC technique because it does not actually recover losses.

Another way is MDC [47,52,53] that generates multiple descriptions with

correlated information from the original speech data. This may be hard in

low bit-rate streams whose correlated information has been largely removed

during coding [47]. Another disadvantage is that the receiver will incur a

longer MED when waiting for all the descriptions to arrive before declaring

a description is lost.

Redundant LC schemes exploit trade-offs among the redundancy level, the

delay for recovering losses from the redundant information, and the quality

of the reconstructed speech. They work on the Internet because increases in

packet size, as long as they are less than the MTU [45], do not lead to notice-

able increases in the loss rate [36]. They consist of schemes that use partial

and full redundancies. Examples employing partial redundancies include

layered coding [31-33], unequal error protection (UEP) [37], and redundant

MDC [38]. Examples employing full redundancies include FEC (forward

error correction) [9,34] and redundant piggybacking [35,36]. An FEC-based

LC scheme [15] for VoIP incorporates into its optimization metric the addi-

tional delay incurred due to redundancy. In our previous work, we have

used piggybacking as a simple yet effective technique for sending copies of

previously sent frames together with new frames in the same packet, with-

out increasing the packet rate [4,10,36].

The main difficulty of using redundant LC schemes is that it is hard to

know a suitable redundancy level. Its dynamic adaption to network condi-

tions may either be too slow, as in Skype [36], or too conservative [4]. Another

consideration is that the redundancy level is application-dependent. Fully

redundant piggybacking is suitable in two-party VoIP, but partial redun-

dancy may need to be used in multi-party VoIP when speech frames from

multiple clients are encapsulated in the same packet.

Figure 2.9b also summarizes the various POS methods. Due to nonsta-

tionary and path-dependent delays and losses, simple schemes with fixed

MEDs either hardcoded at design time or during call establishment do not

provide consistent protection against late losses. Adaptive POS schemes

that adjust the playout schedule at the talk spurt or the packet level are

more prevalent.

At the talk-spurt level, silence segments can be added or omitted at the

beginning of a talk spurt in order to make the changes virtually imperceptible

to the listener. Adjustments can also be made for each frame using time-scale

modification [40] that stretches or compresses frames without changing its

Ubiquitous Multimedia Computing

Search WWH ::

Custom Search

Home