Information Technology Reference
In-Depth Information
groups of 30 and after a certain number of a frames a different group is reset until all have been reset. Predictor
reset codes are transmitted in the side data. Reset will also occur if short frames are selected.
In stereo and 3/2 surround formats there is less redundancy because the signals also carry spatial information. The
effecting of masking may be up to 20 dB less when distortion products are at a different location in the stereo
image from the masking sounds. As a result, stereo signals require much higher bit rate than two mono channels,
particularly on transient material which is rich in spatial clues.
In some cases a better result can be obtained by converting the signal to a mid-side (M/S) or sum/difference format
before quantizing. In surround sound the M/S coding can be applied to the front L/R pair and the rear L/R pair of
signals.
The M/S format can be selected on a block-by-block basis for each scale factor band.
Next comes the lossy stage of the coder where distortion is selectively introduced as a function of frequency as
determined by the masking threshold. This is done by a combination of amplification and requantizing. As
mentioned, coefficients (or residuals) are grouped into scale factor bands. As Figure 4.44 shows, the number of
coefficients varies in order to divide the coefficients into approximate critical bands. Within each scale factor band,
all coefficients will be multiplied by the same scale factor prior to requantizing.
Coefficients which have been multiplied by a large scale factor will suffer less distortion by the requantizer whereas
those which have been multiplied by a small scale factor will have more distortion. Using scale factors, the
psychoacoustic model can shape the distortion as a function of frequency so that it remains masked. The scale
factors allow gain control in 1.5 dB steps over a dynamic range equivalent to twenty-four-bit PCM and are
transmitted as part of the side data so that the decoder can re-create the correct magnitudes. The scale factors are
differentially coded with respect to the first one in the block and the differences are then Huffman coded.
Figure 4.44: In AAC the fine-resolution coefficients are grouped together to form scale factor bands. The size of
these varies to loosely mimic the width of critical bands.
The requantizer uses non-uniform steps which give better coding gain and has a range of ± 8191. The global step
size (which applies to all scale factor bands) can be adjusted in 1.5 dB steps. Following requantizing the
coefficients are Huffman coded.
There are many ways in which the coder can be controlled and any which results in a compliant bitstream is
acceptable although the highest performance may not be reached. The requantizing and scale factor stages will
need to be controlled in order to make best use of the available bit rate and the buffering. This is non-trivial
because the use of Huffman coding after the requantizer makes it impossible to predict the exact amount of data
which will result from a given step size. This means that the process must iterate.
Whatever bit rate is selected, a good encoder will produce consistent quality by selecting window sizes, intra- or
inter-frame prediction and using the buffer to handle entropy peaks. This suggests a connection between buffer
occupancy and the control system. The psychoacoustic model will analyse the incoming audio entropy and during
periods of average entropy it will empty the buffer by slightly raising the quantizer step size so that the bit rate
Search WWH ::




Custom Search