Digital Signal Processing Reference
In-Depth Information
(a)
CRC
(0,16)
Header
(32)
Bit allocation
(128-256)
Scale factors
(0-384)
Ancillary
data
Samples
Layer 1
(b)
SCFSI
(0-60)
Header
(32)
CRC
(0,16)
Bit allocation
(26-256)
Scale factors
(0-1080)
Ancillary
data
Samples
Layer 2
(c)
CRC
(0,16)
Header
(32)
Side information
(136-256)
Main data; not necessary
linked to this frame.
Layer 3
FIGURE 11.17
MPEG audio frame formats.
Using one scale factor for 3 data segments would be called for when values of the scale factors per
subband are sufficiently close and the encoder applies temporal noise masking (a type of noise
masking by the human auditory system) to hide any distortion. In Figure 11.17 , the field “SCFSI”
(scale-factor selection information) contains the information to inform the decoder. A different scale
factor is used for each subband channel when avoidance of audible distortion is required. The bit
allocation can also provide a possible single compact code word to represent three consecutive
quantized values.
The layer 3 frame contains side information and main data that come from Huffman encoding
(lossless coding with an exact recovery) of the W-MDCT coefficients to gain improvement over layer 1
and layer 2.
Figure 11.18 shows the MPEG-1 layer 1 and 2 encoder, and the layer 3 encoder. For MPEG-1 layer
1 and layer 2, the encoder examines the audio input samples using a 1,024-point fast Fourier transform
(FFT). The psycho-acoustic model is analyzed based on the FFT coefficients. This includes possible
frequency masking (hiding noise in frequency domain) and noise temporal masking (hiding noise in
time domain). The result of the analysis of the psycho-acoustic model instructs the bit allocation
scheme.
The major difference in layer 3, called MP3 (the most popular format in the multimedia industry),
is that it adopts the MDCT. First, the encoder can gain further data compression by transforming the
data segments from each subband channel using DCT and then quantizing the DCT coefficients,
which, again, are losslessly compressed using Huffman encoding. As shown in Examples 11.8 to
11.11, since the DCT uses block-based processing, it produces block edge effects, where the beginning
samples and ending samples show discontinuity and cause audible periodic noise. This periodic edge
noise can be alleviated, as discussed in the previous section, by using the W-MDCT, in which there is
50% overlap between successive transform windows.
There are two sizes of windows. One has 36 samples and other 12 samples used in MPEG-1 layer
3 (MP3) audio. The larger block length offers better frequency resolution for low-frequency tonelike
signals, hence it is used for the lowest two subbands. For the rest of the subbands, the shorter block is
used, since it allows better time resolution for noiselike transient signals. Other improvements of MP3
 
Search WWH ::




Custom Search