Information Technology Reference
In-Depth Information
example, a single lost frame in G.729A [46] can lead to perceptible distortions
in the reconstructed speech for up to 20 frames [31,32].
Although the large MTU in packets relative to the frame size provides
new opportunities for LC in the packet-stream layer (see Section 2.4), the LC
mechanisms in the packet-stream and the codec layers may not work well
together. This happens because techniques for recovering lost frames [47]
in codecs often exploit information in the coded speech [48] that may not be
properly encapsulated into packets. For instance, G.729.1 [33], the wide-band
version of G.729A, recovers a single lost frame using multiple layers of infor-
mation from the previous and the next frames received. This LC technique
will be not useful when multiple adjacent frames are encapsulated into a sin-
gle packet that is lost. For this LC technique to be effective, the information
for reconstructing a lost frame must be encapsulated in different packets.
The Internet Low Bit-rate Codec (iLBC) [42] used in early versions of
Skype and its extensions [49,50] address this issue by encoding frames into
self-decodable units using a modified CELP coder with increased bit rate.
Although this approach avoids the propagation of internal-state errors after
a loss, distortions are still perceptible unless additional LC mechanisms are
implemented in the packet-stream layer.
Recently, Global IP Sound (GIPS) released the second version of its proprie-
tary iSAC wideband speech codec for use in Skype. Its white paper [43] indi-
cates that iSAC uses an adaptive packet period of 30-60 ms, with an adaptive
bit rate of 10-32 kbps and a separate low complexity mode. Although the
white paper claims that the codec achieves better performance than G.722.2
for comparable bit rates, there is no independent validation of the claim and
no information on its LC capability.
In evaluating a speech codec designed, its performance is commonly
evaluated by comparing the quality of its reconstructed waveforms under
ideal and nonideal network conditions. One common method of generating
nonideal network conditions is to use stochastic models.
Sun and Ifeachor proposed a simple Bernoulli model with independent
packet losses for modeling the loss behavior in VoIP [13]. The model is highly
approximate because Internet packet losses exhibit temporal dependencies
[51], especially for periodic transmissions. Further, speech quality can vary
significantly across different loss patterns with the same average rate [8]. A
second approach based on the Gilbert model has been used for modeling the
loss behavior of Internet traces in IP telephony [8] and multimedia [51] and
for evaluating speech codecs [49,50]. The model is approximate because it
assumes that a packet loss only depends on the loss of the previous packet.
Extended models, such as the n -state Markov chain and the extended Gilbert
model [51], use additional parameters to model the dependency of losses.
The main deficiency of these models is that they do not consider the LC
algorithm in the packet-stream layer, such as redundant piggybacking [32]
and multidescription coding (MDC) [47,52,53]. There are a number of recent
studies on cross-layer designs of codecs [54-58], but none has focused on
Search WWH ::




Custom Search