Information Technology Reference
In-Depth Information
In MPEG-2 AAC, the frequency coefficients (or their residuals) are quantized according to the adaptive bit-
allocation system and then Huffman coded. At low bit rates the heavy requantizing necessary wil result in some
coefficients being in error. At bit rates below 16 kbit/ s/channel an alternative coding scheme known as TwinVQ
(Transform Domain Weighted Interleaved Vector Quantization) may be used. Vector quantization, also known as
block quantization, [ 30 ] works on blocks of coefficients rather than the individual coefficients used by the Huffman
code. In vector quantizing, one transmitted symbol represents the state of a number of coefficients. In a lossless
system, this symbol would need as many bits as the sum of the coefficients to be coded. In practice the symbol has
fewer bits because the coefficients are quantized (and thereby contain errors). The encoder will select a symbol
which minimizes the errors over the whole block.
Error minimization is assisted by interleaving at the encoder. After interleaving, adjacent coefficients in frequency
space are in different blocks. After the symbol look-up process in the decoder, de-interleaving is necessary to
return the coefficients to their correct frequencies. In Twin VQ the transmitted symbols have constant wordlength
because the vector table has a fixed size for a given bit rate. Constant size symbols have an advantage in the
presence of bit errors because it is easier to maintain synchronization.
[ 30 ] Gersho, A., Asymptotically optimal block quantization. IEEE Trans. Info. Theory , IT-25 , No.4, 373-380 (1979)
4.25 Compression in stereo and surround sound
Once hardware became economic, the move to digital audio was extremely rapid. One of these reasons was the
sheer sound quality available. When well engineered, the PCM digital domain does so little damage to the sound
quality that the problems of the remaining analog parts usually dominate. The one serious exception to this is lossy
compression which does not preserve the original waveform and must therefore be carefully assessed before being
used in high-quality applications.
In a monophonic system, all the sound is emitted from a single point and psychoacoustic masking operates to its
fullest extent. Audio compression techniques of the kind described above work well in mono. However, in
stereophonic (which in this context also includes surround sound) applications different criteria apply. In addition to
the timbral information describing the nature of the sound sources, stereophonic systems also contain spatial
information describing their location.
The greatest cause for concern is that in stereophonic systems, masking is not as effective. When two sound
sources are in physically different locations, the degree of masking is not as great as when they are co-sited.
Unfortunately all the psychoacoustic masking models used in today's compressors assume co-siting. When used in
stereo or surround systems, the artifacts of the compression process can be revealed. This was first pointed out by
the late Michael Gerzon who introduced the term unmasking to describe the phenomenon.
The hearing mechanism has an ability to concentrate on one of many simultaneous sound sources based on
direction. The brain appears to be able to insert a controllable time delay in the nerve signals from one ear with
respect to the other so that when sound arrives from a given direction the nerve signals from both ears are
coherent causing the binaural threshold of hearing to be 3-6 dB better than monaural at around 4 kHz. Sounds
arriving from other directions are incoherent and are heard less well. This is known as attentional selectivity , or
more colloquially as the cocktail party effect . [ 31 ]
Human hearing can locate a number of different sound sources simultaneously by constantly comparing excitation
patterns from the two ears with different delays. Strong correlation will be found where the delay corresponds to the
interaural delay for a given source. This delay- varying mechanism will take time and the ear is slow to react to
changes in source direction. Oscillating sources can only be tracked up to 2-3 Hz and the ability to locate bursts of
noise improves with burst duration up to about 700 milliseconds.
Monophonic systems prevent the use of either of these effects completely because the first version of all sounds
reaching the listener come from the same loudspeaker. Stereophonic systems allow attentional selectivity to
function in that the listener can concentrate on specific sound sources in a reproduced stereophonic image with the
same facility as in the original sound. When two sound sources are spatially separated, if the listener uses
attentional selectivity to concentrate on one of them, the contributions from both ears will correlate. This means that
the contributions from the other sound will be decorrelated, reducing its masking ability significantly. Experiments
showed long ago that even technically poor stereo was always preferred to pristine mono. This is because we are
accustomed to sounds and reverberation coming from all different directions in real life and having them all
superimposed in a mono speaker convinces no-one, however accurate the waveform.
We live in a reverberant world which is filled with sound reflections. If we could separately distinguish every
different reflection in a reverberant room we would hear a confusing cacaphony. In practice we hear very well in
reverberant surroundings, far better than microphones can, because of the transform nature of the ear and the way
in which the brain processes nerve signals. Because the ear has finite frequency discrimination ability in the form of
 
Search WWH ::




Custom Search