Information Technology Reference
In-Depth Information
4.20 MPEG Layer III audio coding
Layer III is the most complex layer, and is only really necessary when the most severe data rate constraints must
be met. It is also known as MP3 in its application of music delivery over the Internet. It is a transform code based
on the ASPEC system with certain modifications to give a degree of commonality with Layer II. The original ASPEC
coder used a direct MDCT on the input samples. In Layer III this was modified to use a hybrid transform
incorporating the existing polyphase 32 band QMF of Layers I and II and retaining the block size of 1152 samples.
In Layer III, the 32 sub-bands from the QMF are further processed by a critically sampled MDCT.
The windows overlap by two to one. Two window sizes are used to reduce pre-echo on transients. The long
window works with 36 sub-band samples corresponding to 24 milliseconds only at 48 kHz and resolves 18 different
frequencies, making 576 frequencies altogether. Coding products are spread over this period which is acceptable
in stationary material but not in the vicinity of transients. In this case the window length is reduced to 8 ms. Twelve
sub-band samples are resolved into six different frequencies making a total of 192 frequencies. This is the
Heisenberg inequality: by increasing the time resolution by a factor of three, the frequency resolution has fallen by
the same factor.
Figure 4.36 shows the available window types. In addition to the long and short symmetrical windows there is a pair
of transition windows, known as start and stop windows which allow a smooth transition between the two window
sizes. In order to use critical sampling, MDCTs must resolve into a set of frequencies which is a multiple of four.
Switching between 576 and 192 frequencies allows this criterion to be met. Note that an 8 ms window is still too
long to eliminate pre-echo. Pre- echo is eliminated using buffering. The use of a short window minimizes the size of
the buffer needed.
Figure 4.36: The window functions of Layer III coding. At (a) is the normal long window, whereas (b) shows the
short window used to handle transients. Switching between window sizes requires transition windows (c) and (d).
An example of switching using transition windows is shown in (e).
Layer III provides a suggested (but not compulsory) pychoacoustic model which is more complex than that
suggested for Layers I and II, primarily because of the need for window switching. Pre-echo is associated with the
entropy in the audio rising above the average value and this can be used to switch the window size. The perceptive
model is used to take advantage of the high-frequency resolution available from the DCT which allows the noise
floor to be shaped much more accurately than with the 32 sub-bands of Layers I and II. Although the MDCT has
high-frequency resolution, it does not carry the phase of the waveform in an identifiable form and so is not useful
for discriminating between tonal and atonal inputs. As a result a side FFT which gives conventional amplitude and
phase data is still required to drive the masking model.
Non-uniform quantizing is used, in which the quantizing step size becomes larger as the magnitude of the
coefficient increases. The quantized coefficients are then subject to Huffman coding. This is a technique where the
most common code values are allocated the shortest wordlength. Layer III also has a certain amount of buffer
memory so that pre-echo can be avoided during entropy peaks despite a constant output bit rate.
Figure 4.37 shows a Layer III encoder. The output from the sub-band filter is 32 continuous band-limited sample
streams. These are subject to 32 parallel MDCTs. The window size can be switched individually in each sub-band
as required by the characteristics of the input audio. The parallel FFT drives the masking model which decides on
window sizes as well as producing the masking threshold for the coefficient quantizer. The distortion control loop
iterates until the available output data capacity is reached with the most uniform NMR. The available output
capacity can vary owing to the presence of the buffer.
 
Search WWH ::




Custom Search