Audio Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

17.3.3 Layer III Coding—mp3

Layer III coding, which has become widely popular under the name mp3, is considerably more

complex than the Layer I and Layer II coding schemes. One of the problems with the Layer I

and II coding schemes was that with 32-band decomposition, the bandwidth of the subbands at

lower frequencies is significantly larger than the critical bands. This makes it difficult to make

an accurate judgement of the mask-to-signal ratio. If we get a high amplitude tone within a

subband and if the subband is narrow enough, we can assume that it masked other tones in

the band. However, if the bandwidth of the subband is significantly higher than the critical

bandwidth at that frequency, it becomes more difficult to determine whether other tones in the

subband will be masked.

A simple way to increase the spectral resolution would be to decompose the signal directly

into a higher number of bands. However, one of the requirements on the Layer III algorithm

is that it be backward compatible with Layer I and Layer II coders. To satisfy this backward

compatibility requirement, the spectral decomposition in the Layer III algorithm is performed

in two stages. First the 32-band subband decomposition used in Layer I and Layer II is

employed. The output of each subband is then transformed using a modified discrete cosine

transform (MDCT) with a 50% overlap. The Layer III algorithm specifies two sizes for the

MDCT, 6 or 18. This means that the output of each subband can be decomposed into 18

frequency coefficients or 6 frequency coefficients.

The reason for having two sizes for the MDCT is that when we transform a sequence into

the frequency domain, we lose time resolution even as we gain frequency resolution. The

larger the block size the more we lose in terms of time resolution. The problem with this

is that any quantization noise introduced into the frequency coefficients will get spread over

the entire block size of the transform. Backward temporal masking occurs for only a short

duration prior to the masking sound (approximately 20 msec). Therefore, quantization noise

will appear as a pre-echo . Consider the signal shown in Figure 17.7 . The sequence consists of

128 samples, the first 118 of which are 0, followed by a sharp increase in value. The 128-point

DCT of this sequence is shown in Figure 17.8 . Notice that many of these coefficients are quite

large. If we were to send all these coefficients, we would have data expansion instead of data

compression. If we keep only the 10 largest coefficients, the reconstructed signal is shown

in Figure 17.9 . Notice that not only are the nonzero signal values not well represented, there

is also error in the samples prior to the change in value of the signal. If this were an audio

signal and the large values had occurred at the beginning of the sequence, the forward masking

effect would have reduced the perceptibility of the quantization error. In the situation shown in

Figure 17.9 , backward masking will mask some of the quantization error. However, backward

masking occurs for only a short duration prior to the masking sound. Therefore, if the length

of the block in question is longer than the masking interval, the distortion will be evident to

the listener.

If we get a sharp sound that is very limited in time (such as the sound of castanets) we

would like to keep the block size small enough that it can contain this sharp sound. Then,

when we incur quantization noise it will not get spread out of the interval in which the actual

sound occurred and will therefore get masked. The Layer III algorithm monitors the input

and where necessary substitutes three short transforms for one long transform. What actually

happens is that the subband output is multiplied by a window function of length 36 during

Introduction to Data Compression

Search WWH ::

Custom Search

Home