Information Technology Reference
In-Depth Information
entering the buffer falls. By running the buffer down, the coder can temporarily support a higher bit rate to handle
transients or difficult material.
Simply stated, the scale factor process is controlled so that the distortion spectrum has the same shape as the
masking threshold and the quantizing step size is controlled to make the level of the distortion spectrum as low as
possible within the allowed bit rate. If the bit rate allowed is high enough, the distortion products will be masked.
[ 26 ] ISO/IEC 13818-7, Information Technology - Generic coding of moving pictures and associated audio, Part 7:
Advanced audio coding (1997)
[ 27 ] Bosi, M. et al., ISO/IEC MPEG-2 Advanced Audio Coding. JAES, 45, 789-814 (1997)
[ 28 ] Herre, J. and Johnston, J.D., Enhancing the performance of perceptual audio coders by using temporal noise
shaping (TNS). Presented at 101st AES Conv., Preprint 4384 (1996)
[ 29 ] Fuchs, H., Improving MPEG audio coding by backward adaptive linear stereo prediction. Presented at 99th AES
Conv., Preprint 4086 (1995)
4.23 MPEG-4 Audio
The audio coding options of MPEG-4 parallel the video coding in complexity. In the same way that the video coding
in MPEG-4 has moved in the direction of graphics with rendering taking place in the decoder, MPEG-4 audio
introduces structured audio in which audio synthesis takes place in the decoder, taking MPEG-4 audio into the
realm of interactive systems and virtual reality. It is now necessary to describe the audio of earlier formats as
natural sound , i.e. that which could have come from a microphone. Natural sound is well supported in MPEG-4 with
a development of AAC which is described in the next section .
Like video coding, MPEG-4 audio coding may be object based. For example, instead of coding the waveforms of a
stereo mix, each sound source in the mix may become a sound object which is individually coded. At the decoder
each sound object is then supplied to the composition stage where it will be panned and mixed with other objects.
At the moment techniques do not exist to create audio objects from a natural stereo signal pair, but where the
audio source is a synthetic, or a mixdown of multiple natural tracks, object coding can be used with a saving in data
rate.
It is also possible to define virtual instruments at the decoder and then to make each of these play by transmitting a
score.
Speech coding is also well supported. Natural speech may be coded at very low bit rates where the goal is
intelligibility of the message rather than fidelity. This can be done with one of two tools: HVXC (Harmonic Vector
eXcitation Coding) or CELP (Code Excited Linear Prediction). Synthetic speech capability allows applications such
as text-to-speech (TTS). MPEG-4 has standardized transmission of speech information in IPA (International
Phonetic Alphabet) or as text.
4.24 MPEG-4 AAC
The MPEG-2 AAC coding tools described in section 4.21 are extended in MPEG-4. The additions are perceptual
noise substitution (PNS) and vector quantization. All coding schemes find noise difficult because it contains no
redundancy. Real audio program material may contain a certain proportion of noise from time to time, traditionally
requiring a high bit rate to transmit without artifacts.
However, experiments have shown that under certain conditions, the listener is unable to distinguish between the
original noise-like waveform and noise locally generated in the decoder. This is the basis for PNS. Instead of
attempting to code a difficult noise sequence, the PNS process will transmit the amplitude of the noise which the
decoder will then create.
At the encoder, PNS will be selected if, over a certain range of frequencies, there is no dominant tone and the time-
domain waveform remains stable, i.e. there are no transients. On a scale factor band and group basis the Huffman-
coded symbols describing the frequency coefficients will be replaced by the PNS flag. At the decoder the missing
coefficients will be derived from random vectors. The amplitude of the noise is encoded in 1.5 dB steps with noise
energy parameters for each group and scale factor band.
In stereo applications, where PNS is used at the same time and frequency in both channels, the random processes
in each channel will be different in order to avoid creating a mid-stage noise object. PNS cannot be used with M/S
audio coding.
 
Search WWH ::




Custom Search