Audio Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

would not be considered for signals containing significant amounts of transients. However,

music signals have exactly this characteristic. Although they may contain long periods of

stationary signals, they also generally contain a significant amount of transient signals. The

AAC algorithm makes clever use of the time frequency duality to handle this situation. The

standard contains two kinds of predictors: an intrablock predictor, referred to as Temporal

Noise Shaping (TNS), and an interblock predictor. The interblock predictor is used during

stationary periods. During these periods it is reasonable to assume that the coefficients at a

certain frequency do not change their value significantly from block to block. Making use

of this characteristic, the AAC standard implements a set of parallel DPCM systems. There

is one predictor for each coefficient up to a maximum number of coefficients. The maxi-

mum is different for different sampling frequencies. Each predictor is a backward adaptive

two-tap predictor. This predictor is really useful only in stationary periods. Therefore, the

psychoacoustic model monitors the input and determines when the output of the predictor is

to be used. The decision is made on a scalefactor band by scalefactor band basis. Because

notification of the decision that the predictors are being used has to be sent to the decoder, this

would increase the rate by one bit for each scalefactor band. Therefore, once the preliminary

decision to use the predicted value has been made, further calculations are made to check if

the savings will be sufficient to offset this increase in rate. If the savings are determined to

be sufficient, a predictor_data_present bit is set to 1 and one bit for each scalefactor band

(called the prediction_used bit) is set to 1 or 0 depending on whether prediction was deemed

effective for that scalefactor band. If not, the predictor_data_present bit is set to 0 and the

prediction_used bits are not sent. Even when a predictor is disabled, the adaptive algorithm

is continued so that the predictor coefficients can track the changing coefficients. However,

because this is a streaming audio format it is necessary from time to time to reset the co-

efficients. Resetting is done periodically in a staged manner and also when a short frame

is used.

When the audio input contains transients, the AAC algorithm uses the intraband predictor.

Recall that narrowpulses in time correspond towide bandwidths. The narrower a signal in time,

the broader its Fourier transform will be. This means that when transients occur in the audio

signal, the resultingMDCT output will contain a large number of correlated coefficients. Thus,

unpredictability in time translates to a high level of predictability in terms of the frequency

components. The AAC uses neighboring coefficients to perform prediction. A target set of

coefficients is selected in the block. The standard suggests a range of 1.5kHz to the uppermost

scalefactor band as specified for different profiles and sampling rates. A set of linear predictive

coefficients is obtained using any of the standard approaches, such as the Levinson-Durbin

algorithm described in Chapter 18. The maximum order of the filter ranges from 12 to 20

depending on the profile. The process of obtaining the filter coefficients also provides the

expected prediction gain g p . This expected prediction gain is compared against a threshold to

determine if intrablock prediction is going to be used. The standard suggests a value of 1.4 for

the threshold. The order of the filter is determined by the first partial correlation (PARCOR)

coefficient with a magnitude smaller than a threshold (suggested to be 0.1). The PARCOR

coefficients corresponding to the predictor are quantized and coded for transfer to the decoder.

The reconstructed LPC coefficients are then used for prediction. In the time domain predictive

coders, one effect of linear prediction is the spectral shaping of the quantization noise. The

effect of prediction in the frequency domain is the temporal shaping of the quantization noise,

Introduction to Data Compression

Search WWH ::

Custom Search

Home