Information Technology Reference
In-Depth Information
designs for real-time VoIP. To provide efficient use of resources and the best
conversational quality, the LC mechanisms in the packet-stream and the
codec layers must be developed in a coupled fashion.
To optimize perceptual quality, our survey identifies the need to design
the LC mechanism in codecs with that of the packet-stream layer. This means
that the encoding of speech into frames must dynamically adapt to the
packet rate, which in turn adapts to network congestion. In the next section,
we describe our approach to improve the design of codecs for VoIP.
2.3.2 Cross-Layer Designs of Speech Codecs
Codecs with Self-Decodable Units. To avoid the propagation of errors in
internal states across packet boundaries and to maximize coding efficiency
within a packet, we have designed codecs that encode frames in such a way
that are self-decodable but may have dependent internal states when encap-
sulated into a packet. This is similar to what was done in iLBC [42]. In addi-
tion, these codecs can operate in multiple modes, in terms of frame size and
packet period selected by the SVM learned in the packet-stream layer for
optimizing conversational quality.
Based on the ITU G.729 CS-ACELP speech codec operating at 8 kbps, we
have developed cross-layer designs with redundancy-based LC. In our first
design [31], we have increased the frame length in order to reserve space for
redundancies at the packet level, without increasing the bit rate. It uses MDC
to conceal losses at the packet level and adapts to dynamic loss conditions,
while maintaining an acceptable end-to-end delay. By protecting only the
most important excitation parameters of each frame according to its speech
type, our approach enables more efficient usage of the bit budget. In our sec-
ond design [32], we have developed a variable bit-rate layered coding scheme
that dynamically adapts to the characteristics of the speech encoded and the
network loss conditions. To cope with bursty losses while maintaining an
acceptable end-to-end delay, our scheme employs layered coding with redun-
dant piggybacking of perceptually important parameters in the base layer,
with a degree of redundancy adjusted according to feedbacks from receivers.
Under various delay constraints, we study trade-offs between the additional
bit rate required for redundant piggybacking and the protection of perceptu-
ally important parameters. Although these cross-layer designs incorporate LC
information in the packet-stream layer, G.729 is not the perfect codec because
it suffers from the propagation of errors in internal states across packet bound-
aries. We have also applied a similar approach to the design of the G.722.2 and
G.729.1 wide-band speech codecs that have self-decodable frames.
Cross-Layer Designs of Speech codecs. To facilitate the generation of infor-
mation for effective LC, the encoder needs to know the amount of payload in
each packet available for carrying the LC information. This is important in
multi-party VoIP when multiple voice streams have to be encapsulated in the
same packet and the payload for carrying redundant information is limited.
Search WWH ::




Custom Search