The Design of VoIP Systems with High Perceptual Conversational Quality - Ubiquitous Multimedia Computing

Information Technology Reference

In-Depth Information

2.5 Network-Layer Support for VoIP

Several standardized and proposed transport protocols, such as TCP, UDP,

and RTP, were designed with different trade-offs between reliable delivery of

packets and application-layer end-to-end delay. By providing hooks for syn-

chronization, reliability, QoS feedback, and flow control, RTP can support

real-time multimedia applications [61-63]. Although RTP does not guarantee

the real-time delivery of continuous media, it relies on resource reservations

protocols, such as RSVP [64], for resource scheduling. RTP can be used in

conjunction with the mechanisms described in this chapter to ensure high

conversational quality. Another approach is to design end-to-end protocols

that are more TCP friendly [65,66]. These protocols will need to be extended

in order to address the trade-offs in conversational quality.

In multi-party VoIP, the current speaker(s) needs to convey his or her speech

information to all the listeners. A good design should accommodate dynamic

and diverse network conditions among the clients. The protocol and the con-

nection topology used are generally dictated by where audio mixing is done

[67]. When a VoIP client (as in Skype [68]) or a bridge (as in QQTalk [69]) is

responsible for decoding, mixing, and encoding the signals from the clients,

it is natural for all the clients to send their packets to the centralized site to

be mixed and forwarded. The approach may not be scalable because it can

create a bottleneck near that site. Moreover, the maximum end-to-end delay

(ME2ED) between any speaker-listener pair can be large when the clients are

geographically distributed [10].

On the other hand, a distributed approach asks each client to indepen-

dently manage its transmissions. One way is for each speaker to multicast

the speech information to all listeners. Although multicasts are available on

the Internet [70], the support of reliable real-time multicasts for receivers of

different loss and delay behavior is very preliminary [71]. The focus in the

IETF working group on a NACK-based asynchronous-layered coding proto-

col with FEC [72] is inadequate for multi-party VoIP.

A hybrid approach is to have an overlay network [73] that uses a subset

of the clients to manage the mixing and forwarding of unicast packets. The

approach achieves a shorter ME2ED than a centralized approach and a

smaller number of unicast messages than a fully distributed approach. One

issue, however, is that it is complex for the overlay clients to coordinate the

decoding, mixing, and encoding of the speech signals. Alternatively, we have

taken the approach for the overlay clients to simply encapsulate the speech

frames from the multiple clients into a single packet before forwarding the

frames [10]. This is not an issue as long as the number of streams to be encap-

sulated fits within the MTU of each packet.

To design a topology with proper trade-offs between ME2ED among the

clients and the maximum number of packets relayed in a single packet

period by any client, we have studied a commonly used overlay topology

Ubiquitous Multimedia Computing

Search WWH ::

Custom Search

Home