Databases Reference
In-Depth Information
would not be considered for signals containing significant amounts of transients. However,
music signals have exactly this characteristic. Although they may contain long periods of
stationary signals, they also generally contain a significant amount of transient signals. The
AAC algorithm makes clever use of the time frequency duality to handle this situation. The
standard contains two kinds of predictors: an intrablock predictor, referred to as Temporal
Noise Shaping (TNS), and an interblock predictor. The interblock predictor is used during
stationary periods. During these periods it is reasonable to assume that the coefficients at a
certain frequency do not change their value significantly from block to block. Making use
of this characteristic, the AAC standard implements a set of parallel DPCM systems. There
is one predictor for each coefficient up to a maximum number of coefficients. The maxi-
mum is different for different sampling frequencies. Each predictor is a backward adaptive
two-tap predictor. This predictor is really useful only in stationary periods. Therefore, the
psychoacoustic model monitors the input and determines when the output of the predictor is
to be used. The decision is made on a scalefactor band by scalefactor band basis. Because
notification of the decision that the predictors are being used has to be sent to the decoder, this
would increase the rate by one bit for each scalefactor band. Therefore, once the preliminary
decision to use the predicted value has been made, further calculations are made to check if
the savings will be sufficient to offset this increase in rate. If the savings are determined to
be sufficient, a predictor_data_present bit is set to 1 and one bit for each scalefactor band
(called the prediction_used bit) is set to 1 or 0 depending on whether prediction was deemed
effective for that scalefactor band. If not, the predictor_data_present bit is set to 0 and the
prediction_used bits are not sent. Even when a predictor is disabled, the adaptive algorithm
is continued so that the predictor coefficients can track the changing coefficients. However,
because this is a streaming audio format it is necessary from time to time to reset the co-
efficients. Resetting is done periodically in a staged manner and also when a short frame
is used.
When the audio input contains transients, the AAC algorithm uses the intraband predictor.
Recall that narrowpulses in time correspond towide bandwidths. The narrower a signal in time,
the broader its Fourier transform will be. This means that when transients occur in the audio
signal, the resultingMDCT output will contain a large number of correlated coefficients. Thus,
unpredictability in time translates to a high level of predictability in terms of the frequency
components. The AAC uses neighboring coefficients to perform prediction. A target set of
coefficients is selected in the block. The standard suggests a range of 1.5kHz to the uppermost
scalefactor band as specified for different profiles and sampling rates. A set of linear predictive
coefficients is obtained using any of the standard approaches, such as the Levinson-Durbin
algorithm described in Chapter 18. The maximum order of the filter ranges from 12 to 20
depending on the profile. The process of obtaining the filter coefficients also provides the
expected prediction gain g p . This expected prediction gain is compared against a threshold to
determine if intrablock prediction is going to be used. The standard suggests a value of 1.4 for
the threshold. The order of the filter is determined by the first partial correlation (PARCOR)
coefficient with a magnitude smaller than a threshold (suggested to be 0.1). The PARCOR
coefficients corresponding to the predictor are quantized and coded for transfer to the decoder.
The reconstructed LPC coefficients are then used for prediction. In the time domain predictive
coders, one effect of linear prediction is the spectral shaping of the quantization noise. The
effect of prediction in the frequency domain is the temporal shaping of the quantization noise,
Search WWH ::




Custom Search