Information Technology Reference
In-Depth Information
techniques. Waveform codecs, such as ITU G.711 and G.726 [11], were designed
to reconstruct a sample-wise waveform as closely as possible. Parametric
codecs, such as G.722.2, G.723.1, G.728, and G.729A [11], model the production
of speech in order to reconstruct a waveform that perceptually resembles the
original speech. Hybrid codecs, such as G.729.1 [11], iLBC [44], and iSAC [43],
combine techniques from both. Under no-loss conditions, the perceptual
quality of a codec is a function of its coding technique and bit rate. However,
it is difficult to compare codecs under loss conditions.
Parametric and hybrid codecs are popular in VoIP because they have lower
bit rates and better perceptual quality. By controlling the frame size and the
frame period, their design involves trade-offs among robustness, quality, and
algorithmic delay [44]. Frame size generally varies between 10 bytes (G.729A)
and 80 bytes (G.729.1, 32-kbps wideband mode), with multimode codecs hav-
ing a wide range. (For example, G.722.2 has frames between 17 and 60 bytes.)
Frame period varies between 10 ms (G.729A) and 60 ms (iSAC with 30-60 ms
adaptive size). The most common periods are 20 ms (such as iLBC 15.2-kbps
mode, G.729.1 and G722.2) and 30ms (iLBC 13.3-kbps mode and G723.1). In
general, a larger frame with a shorter period achieves higher quality within
the multiple modes of a codec or within a family of codecs with similar cod-
ing techniques. However, a longer frame and look-ahead window may incur
more algorithmic delay.
Figure 2.9a summarizes some of the LC techniques employed in speech
codecs. Receiver-based schemes that do not use redundant information can
be classified into sample-based and model-based schemes. In early systems,
silence or comfort-noise substitution [24] or repetition of the previous frame
[26] was proposed in place of a lost frame. Also proposed was the transmis-
sion of even and odd samples in separate packets and using sample-based
interpolation when a packet was lost [25]. Early model-based schemes simply
repeat the codec parameters of the last successfully received frame [28]. Later,
interpolation of codec parameters from the previous and the next frame was
proposed [27]. Other schemes utilized the information about the voiced-
unvoiced properties of a speech frame to apply specialized LC for reducing
the perception of degradation [29,30]. Schemes that require the cooperation
of the sender and the receiver utilize partial redundant information [31-33]
that is made available by the packet-stream-level LC (see Section 2.4).
The trade-offs between frame size and frame period are different from
those between packet size and packet period in the packet-stream layer. To
avoid excessive losses in the Internet, it is important to choose an appropriate
packet period, as long as packets are smaller than an MTU of 576 bytes and
can be sent without fragmentation [45]. (In practice, an MTU of 1,500 bytes
will not cause fragmentation.) For VoIP using IPv4, a packet period between
30 ms and 60 ms generally works well. When the frame period is shorter
than the packet period, multiple frames have to be encapsulated in a packet
before they are sent. For some codecs, the loss of a single packet can cause
a misalignment of its internal states and degrade its decoded output. For
Search WWH ::




Custom Search