Databases Reference
In-Depth Information
Light
source
Spatial
low-pass
filter
Logarithmic
nonlinearity
F I GU R E 8 . 3
A model of monochromatic vision.
How does this description of the human visual system relate to coding schemes? Notice
that the mind does not perceive everything the eye sees. We can use this knowledge to design
compression systems such that the distortion introduced by our lossy compression scheme is
not noticeable.
8.3.2 Auditory Perception
The ear is divided into three parts, creatively named the outer ear, the middle ear, and the
inner ear. The outer ear consists of the structure that directs the sound waves, or pressure
waves, to the tympanic membrane , or eardrum. This membrane separates the outer ear from
the middle ear. The middle ear is an air-filled cavity containing three small bones that provide
coupling between the tympanic membrane and the oval window , which leads into the inner
ear. The tympanic membrane and the bones convert the pressure waves in the air to acoustical
vibrations. The inner ear contains, among other things, a snail-shaped passage called the
cochlea that contains the transducers that convert the acoustical vibrations to nerve impulses.
The human ear can hear sounds from approximately 20 Hz to 20 kHz, a 1000:1 range of
frequencies. The range decreases with age; older people are usually unable to hear the higher
frequencies. As in vision, auditory perception has several nonlinear components. One is that
loudness is a function not only of the sound level, but also of the frequency. Thus, for example,
a pure 1 kHz tone presented at a 20 dB intensity level will have the same apparent loudness as a
50 Hz tone presented at a 50 dB intensity level. By plotting the amplitude of tones at different
frequencies that sound equally loud, we get a series of curves called the Fletcher-Munson
curves [ 107 ].
Another very interesting audio phenomenon is that of masking , where one sound blocks
out or masks the perception of another sound. The fact that one sound can drown out another
seems reasonable. What is not so intuitive about masking is that if we were to try to mask a
pure tone with noise, only the noise in a small frequency range around the tone being masked
contributes to the masking. This range of frequencies is called the critical band .Formost
frequencies, when the noise just masks the tone, the ratio of the power of the tone divided by
the power of the noise in the critical band is a constant [ 108 ]. The width of the critical band
varies with frequency. This fact has led to the modeling of auditory perception as a bank of
band-pass filters. There are a number of other, more complicated masking phenomena that also
lend support to this theory (see [ 108 , 109 ] for more information). The limitations of auditory
perception play a major role in the design of audio compression algorithms. We will delve
further into these limitations when we discuss audio compression in Chapter 17.
 
Search WWH ::




Custom Search