Audio compression - The MPEG

Information Technology Reference

In-Depth Information

Whilst the above systems used alone do allow coding gain, the compression factor has to be limited because little

benefit is obtained from masking. This is because the techniques above produce distortion which may be found

anywhere over the entire audio band. If the audio input spectrum is narrow, this noise will not be masked.

Sub-band coding [ 16 ] splits the audio spectrum up into many different frequency bands. Once this has been done,

each band can be individually processed. In real audio signals many bands will contain lower level signals than the

loudest one. Individual companding of each band will be more effective than broadband companding. Sub-band

coding also allows the level of distortion products to be raised selectively so that distortion is only created at

frequencies where spectral masking will be effective.

It should be noted that the result of reducing the wordlength of samples in a sub-band coder is often referred to as

noise. Strictly, noise is an unwanted signal which is decorrelated from the wanted signal. This is not generally what

happens in audio compression. Although the original audio conversion would have been correctly dithered, the

linearizing random element in the low-order bits will be some way below the end of the shortened word. If the word

is simply rounded to the nearest integer the linearizing effect of the original dither will be lost and the result will be

quantizing distortion. As the distortion takes place in a bandlimited system the harmonics generated will alias back

within the band. Where the requantizing process takes place in a sub-band, the distortion products will be confined

to that sub-band as shown in Figure 3.71. Such distortion is anharmonic.

Following any perceptive coding steps, the resulting data may be further subject to lossless binary compression

tools such as prediction, Huffman coding or a combination of both.

Audio is usually considered to be a time-domain waveform as this is what emerges from a microphone. As has

been seen in Chapter 3 , spectral analysis allows any periodic waveform to be represented by a set of harmonically

related components of suitable amplitude and phase. In theory it is perfectly possible to decompose a periodic

input waveform into its constituent frequencies and phases, and to record or transmit the transform. The transform

can then be inverted and the original waveform will be precisely re-created.

Although one can think of exceptions, the transform of a typical audio waveform changes relatively slowly much of

the time. The slow speech of an organ pipe or a violin string, or the slow decay of most musical sounds allow the

rate at which the transform is sampled to be reduced, and a coding gain results. At some frequencies the level will

be below maximum and a shorter wordlength can be used to describe the coefficient. Further coding gain will be

achieved if the coefficients describing frequencies which will experience masking are quantized more coarsely.

In practice there are some difficulties, real sounds are not periodic, but contain transients which transformation

cannot accurately locate in time. The solution to this difficulty is to cut the waveform into short segments and then

to transform each individually. The delay is reduced, as is the computational task, but there is a possibility of

artifacts arising because of the truncation of the waveform into rectangular time windows. A solution is to use

window functions, and to overlap the segments as shown in Figure 4.26 . Thus every input sample appears in just

two transforms, but with variable weighting depending upon its position along the time axis.

Figure 4.26: Transform coding can only be practically performed on short blocks. These are overlapped using

window functions in order to handle continuous waveforms.

The DFT (discrete frequency transform) does not produce a continuous spectrum, but instead produces

coefficients at discrete frequencies. The frequency resolution (i.e. the number of different frequency coefficients) is

equal to the number of samples in the window. If overlapped windows are used, twice as many coefficients are

produced as are theoretically necessary. In addition the DFT requires intensive computation, owing to the

requirement to use complex arithmetic to render the phase of the components as well as the amplitude. An

alternative is to use discrete cosine transforms (DCT) or the modified discrete cosine transform (MDCT) which has

the ability to eliminate the overhead of coefficients due to overlapping the windows and return to the critically

sampled domain. [ 17 ] Critical sampling is a term which means that the number of coefficients does not exceed the

number which would be obtained with non-overlapping windows.

[ 14 ] Caine, C.R., English, A.R. and O'Clarey, J.W.H. NICAM-3: near-instantaneous com-panded digital transmission

for high-quality sound programmes. J. IERE , 50 , 519-530 (1980)

[ 15 ] Davidson, G.A. and Bosi, M., AC-2: High quality audio coding for broadcast and storage. Proc. 46th Ann.

Broadcast Eng. Conf., Las Vegas, 98-105 (1992)

[ 16 ] Crochiere, R.E., Sub-band coding. Bell System Tech. J., 60, 1633-1653 (1981)

The MPEG

Search WWH ::

Custom Search

Home