Information Technology Reference
In-Depth Information
silences and an overlay network for multi-party VoIP systems. The approach
leads to multi-party conversations with high listening-only speech quality
(LOSQ) and balanced mutual silences.
2.1 Introduction
This chapter summarizes the results on real-time, two-party and multi-party
VoIP (voice-over-IP) systems that can achieve high perceptual conversational
quality. It focuses on the fundamental understanding of conversational qual-
ity and its trade-offs among the design of speech codecs and strategies for
network control and loss concealments. An important aspect of this research
is on the development of new methods for reducing the large number of
subjective tests and for automated learning and generalization of the results
of subjective evaluations. Since the network delays in VoIP can be long and
time varying, its design is different from those for public switched telephone
network (PSTN) with short and consistent delays [1].
Figure 2.1 outlines the components in the design of a VoIP system. The
first component on conversational quality entails the study of human conver-
sational behavior, modeling conversational dynamics, and identifying user-
perceptible attributes that affect quality. Its study also includes the design
of off-line subjective tests and algorithms for learning the test results. Next,
the study of network and conversational environments entails the identification
of objective metrics for characterizing network and conversational condi-
tions and the dissemination of this information at run time. The design of
loss-concealment (LC) and playout scheduling (POS) strategies in the packet-
stream layer involves delay-quality trade-offs that optimize user-perceptible
attributes. The network-control layer provides support for network transport
and admissions control in multi-party VoIP. Lastly, the design of the speech
codec and its LC and compression capabilities must take into account its inter-
actions with the LC and POS strategies in the packet-stream layer.
Effects of Delays on Conversations. In a two-party conversation, each par-
ticipant takes turns speaking and listening [3,5], and both perceive a silence
period (called mutual silence or MS) when the conversation switches from
one party to another. Hence, a conversation consists of alternating speech
segments and silence periods.
In a face-to-face setting, both participants have a common reality of the con-
versation: one speech segment is separated from another by a silence period
that is identically perceived by both. However, when the same conversation
is conducted over the Internet, the participants' perception of the conversa-
tion is different due to delays, jitters, and losses incurred on the speech seg-
ments during their transmission [4,5].
Richards [6] has identified three factors that influence the quality of service
in telephone systems: difficulty in listening to one-way speech, difficulty in
 
Search WWH ::




Custom Search