Databases Reference
In-Depth Information
to trigger switching in the block length of the MDCT transform and to produce the threshold
values used to determine scalefactors and quantization thresholds. The audio data is fed in
parallel to both the acoustic model and to the modified discrete cosine transform.
Block Switching and MDCT
Because the AAC algorithm is not backward compatible it does away with the requirement of
the 32-band filter bank. Instead, the frequency decomposition is accomplished by a modified
discrete cosine transform (MDCT). TheMDCT is described inChapter 13. TheAAC algorithm
allows switching between a window length of 2048 samples and 256 samples. These window
lengths include a 50% overlap with neighboring blocks. So 2048 time samples are used to
generate 1024 spectral coefficients, and 256 time samples are used to generate 128 frequency
coefficients. The k th spectral coefficient of block i , X i , k , is given by
z i , n cos 2
k
N
1
π(
n
+
n o )
1
2
X i , k =
2
+
N
n
=
0
where z i , n is the n th time sample of the i th block, N is the window length, and
N
/
2
+
1
n o =
2
The longer block length allows the algorithm to take advantage of stationary portions of the
input to get significant improvements in compression. The short block length allows the
algorithm to handle sharp attacks without incurring substantial distortion and rate penalties.
Short blocks occur in groups of eight in order to avoid framing issues. As in the case of MPEG
Layer III, there are four kinds of windows: long, short, start, and stop. The decision about
whether to use a group of short blocks is made by the psychoacoustic model. The coefficients
are divided into scalefactor bands in which the number of coefficients in the bands reflects the
critical bandwidth. Each scalefactor band is assigned a single scalefactor. The exact division
of the coefficients into scalefactor bands for the different windows and different sampling rates
is specified in the standard [ 221 ].
Spectral Processing
In MPEG Layer III coding the compression gain is mainly achieved through the unequal
distribution of energy in the different frequency bands, the use of the psychoacoustic model,
and Huffman coding. The unequal distribution of energy allows use of fewer bits for spectral
bands with less energy. The psychoacoustic model is used to adjust the quantization step size
in a way that masks the quantization noise. Huffman coding allows further reductions in the
bit rate. All these approaches are also used in the AAC algorithm. In addition, the algorithm
makes use of prediction to reduce the dynamic range of the coefficients and thus allow further
reduction in the bit rate.
Recall that prediction is generally useful only in stationary conditions. By their very nature,
transients are almost impossible to predict. Therefore, generally speaking, predictive coding
 
Search WWH ::




Custom Search