Audio Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

to trigger switching in the block length of the MDCT transform and to produce the threshold

values used to determine scalefactors and quantization thresholds. The audio data is fed in

parallel to both the acoustic model and to the modified discrete cosine transform.

Block Switching and MDCT

Because the AAC algorithm is not backward compatible it does away with the requirement of

the 32-band filter bank. Instead, the frequency decomposition is accomplished by a modified

discrete cosine transform (MDCT). TheMDCT is described inChapter 13. TheAAC algorithm

allows switching between a window length of 2048 samples and 256 samples. These window

lengths include a 50% overlap with neighboring blocks. So 2048 time samples are used to

generate 1024 spectral coefficients, and 256 time samples are used to generate 128 frequency

coefficients. The k th spectral coefficient of block i , X i , k , is given by

z i , n cos 2

k

N

−

1

π(

n

+

n o )

1

2

X i , k =

2

+

N

n

=

0

where z i , n is the n th time sample of the i th block, N is the window length, and

N

/

2

+

1

n o =

2

The longer block length allows the algorithm to take advantage of stationary portions of the

input to get significant improvements in compression. The short block length allows the

algorithm to handle sharp attacks without incurring substantial distortion and rate penalties.

Short blocks occur in groups of eight in order to avoid framing issues. As in the case of MPEG

Layer III, there are four kinds of windows: long, short, start, and stop. The decision about

whether to use a group of short blocks is made by the psychoacoustic model. The coefficients

are divided into scalefactor bands in which the number of coefficients in the bands reflects the

critical bandwidth. Each scalefactor band is assigned a single scalefactor. The exact division

of the coefficients into scalefactor bands for the different windows and different sampling rates

is specified in the standard [ 221 ].

Spectral Processing

In MPEG Layer III coding the compression gain is mainly achieved through the unequal

distribution of energy in the different frequency bands, the use of the psychoacoustic model,

and Huffman coding. The unequal distribution of energy allows use of fewer bits for spectral

bands with less energy. The psychoacoustic model is used to adjust the quantization step size

in a way that masks the quantization noise. Huffman coding allows further reductions in the

bit rate. All these approaches are also used in the AAC algorithm. In addition, the algorithm

makes use of prediction to reduce the dynamic range of the coefficients and thus allow further

reduction in the bit rate.

Recall that prediction is generally useful only in stationary conditions. By their very nature,

transients are almost impossible to predict. Therefore, generally speaking, predictive coding

Introduction to Data Compression

Search WWH ::

Custom Search

Home