Information Technology Reference
In-Depth Information
Whilst the above systems used alone do allow coding gain, the compression factor has to be limited because little
benefit is obtained from masking. This is because the techniques above produce distortion which may be found
anywhere over the entire audio band. If the audio input spectrum is narrow, this noise will not be masked.
Sub-band coding [ 16 ] splits the audio spectrum up into many different frequency bands. Once this has been done,
each band can be individually processed. In real audio signals many bands will contain lower level signals than the
loudest one. Individual companding of each band will be more effective than broadband companding. Sub-band
coding also allows the level of distortion products to be raised selectively so that distortion is only created at
frequencies where spectral masking will be effective.
It should be noted that the result of reducing the wordlength of samples in a sub-band coder is often referred to as
noise. Strictly, noise is an unwanted signal which is decorrelated from the wanted signal. This is not generally what
happens in audio compression. Although the original audio conversion would have been correctly dithered, the
linearizing random element in the low-order bits will be some way below the end of the shortened word. If the word
is simply rounded to the nearest integer the linearizing effect of the original dither will be lost and the result will be
quantizing distortion. As the distortion takes place in a bandlimited system the harmonics generated will alias back
within the band. Where the requantizing process takes place in a sub-band, the distortion products will be confined
to that sub-band as shown in Figure 3.71. Such distortion is anharmonic.
Following any perceptive coding steps, the resulting data may be further subject to lossless binary compression
tools such as prediction, Huffman coding or a combination of both.
Audio is usually considered to be a time-domain waveform as this is what emerges from a microphone. As has
been seen in Chapter 3 , spectral analysis allows any periodic waveform to be represented by a set of harmonically
related components of suitable amplitude and phase. In theory it is perfectly possible to decompose a periodic
input waveform into its constituent frequencies and phases, and to record or transmit the transform. The transform
can then be inverted and the original waveform will be precisely re-created.
Although one can think of exceptions, the transform of a typical audio waveform changes relatively slowly much of
the time. The slow speech of an organ pipe or a violin string, or the slow decay of most musical sounds allow the
rate at which the transform is sampled to be reduced, and a coding gain results. At some frequencies the level will
be below maximum and a shorter wordlength can be used to describe the coefficient. Further coding gain will be
achieved if the coefficients describing frequencies which will experience masking are quantized more coarsely.
In practice there are some difficulties, real sounds are not periodic, but contain transients which transformation
cannot accurately locate in time. The solution to this difficulty is to cut the waveform into short segments and then
to transform each individually. The delay is reduced, as is the computational task, but there is a possibility of
artifacts arising because of the truncation of the waveform into rectangular time windows. A solution is to use
window functions, and to overlap the segments as shown in Figure 4.26 . Thus every input sample appears in just
two transforms, but with variable weighting depending upon its position along the time axis.
Figure 4.26: Transform coding can only be practically performed on short blocks. These are overlapped using
window functions in order to handle continuous waveforms.
The DFT (discrete frequency transform) does not produce a continuous spectrum, but instead produces
coefficients at discrete frequencies. The frequency resolution (i.e. the number of different frequency coefficients) is
equal to the number of samples in the window. If overlapped windows are used, twice as many coefficients are
produced as are theoretically necessary. In addition the DFT requires intensive computation, owing to the
requirement to use complex arithmetic to render the phase of the components as well as the amplitude. An
alternative is to use discrete cosine transforms (DCT) or the modified discrete cosine transform (MDCT) which has
the ability to eliminate the overhead of coefficients due to overlapping the windows and return to the critically
sampled domain. [ 17 ] Critical sampling is a term which means that the number of coefficients does not exceed the
number which would be obtained with non-overlapping windows.
[ 14 ] Caine, C.R., English, A.R. and O'Clarey, J.W.H. NICAM-3: near-instantaneous com-panded digital transmission
for high-quality sound programmes. J. IERE , 50 , 519-530 (1980)
[ 15 ] Davidson, G.A. and Bosi, M., AC-2: High quality audio coding for broadcast and storage. Proc. 46th Ann.
Broadcast Eng. Conf., Las Vegas, 98-105 (1992)
[ 16 ] Crochiere, R.E., Sub-band coding. Bell System Tech. J., 60, 1633-1653 (1981)
Search WWH ::




Custom Search