Introduction to compression - The MPEG

Information Technology Reference

In-Depth Information

Figure 1.16: The visual object types supported by each visual profile of MPEG-4.

1.11 Audio compression

Perceptive coding in audio relies on the principle of auditory masking, which is treated in detail in section 4.1 .

Masking causes the ear/brain combination to be less sensitive to sound at one frequency in the presence of

another at a nearby frequency. If a first tone is present in the input, then it will mask signals of lower level at nearby

frequencies. The quantizing of the first tone and of further tones at those frequencies can be made coarser. Fewer

bits are needed and a coding gain results. The increased quantizing error is allowable if it is masked by the

presence of the first tone.

1.11.1 Sub-band coding

Sub-band coding mimics the frequency analysis mechanism of the ear and splits the audio spectrum into a large

number of different bands.

Signals in these bands can then be quantized independently. The quantizing error which results is confined to the

frequency limits of the band and so it can be arranged to be masked by the program material.

The techniques used in Layers I and II of MPEG audio are based on sub- band coding as are those used in DCC

(Digital Compact Cassette).

1.11.2 Transform coding

In transform coding the time-domain audio waveform is converted into a frequency domain representation such as

a Fourier, discrete cosine or wavelet transform (see Chapter 3 ). Transform coding takes advantage of the fact that

the amplitude or envelope of an audio signal changes relatively slowly and so the coefficients of the transform can

be transmitted relatively infrequently. Clearly such an approach breaks down in the presence of transients and

adaptive systems are required in practice. Transients cause the coefficients to be updated frequently whereas in

stationary parts of the signal such as sustained notes the update rate can be reduced. Discrete cosine transform

(DCT) coding is used in Layer III of MPEG audio and in the compression system of the Sony MiniDisc.

1.11.3 Predictive coding

In a predictive coder there are two identical predictors, one in the coder and one in the decoder. Their job is to

examine a run of previous data values and to extrapolate forward to estimate or predict what the next value will be.

This is subtracted from the actual next code value at the encoder to produce a prediction error which is transmitted.

The decoder then adds the prediction error to its own prediction to obtain the output code value again.

Prediction can be used in the time domain, where sample values are predicted, or in the frequency domain where

coefficient values are predicted. Time-domain predictive coders work with a short encode and decode delay and

are useful in telephony where a long loop delay causes problems. Frequency prediction is used in AC-3 and MPEG

AAC.

1.12 MPEG bitstreams

The MPEG

Search WWH ::

Custom Search

Home