Information Technology Reference
In-Depth Information
Directional hearing in the vertical plane (el-
evation) is dominated by monaural cues. These
stem from direction-dependent spectral filtering
caused by reflection and diffraction at the torso,
head, and pinnae. Each direction of incidence
(for instance, defined in terms of azimuth and
elevation) is related to a unique spectral filtering
for each individual. This spectral filtering can
be described by head-related transfer functions
(HRTFs). In addition to providing localization of
sounds in the vertical plane, these spectral cues
are also essential for resolving front-back confu-
sions (Blauert, 2001). Pulkki (2001) reports that,
for elevation perception, frequencies around 6kHz
are especially important.
In everyday situations, localization of sound
sources seldom relies on auditory cues alone.
Knowledge of the potential source of a sound (for
example, airplane noises from above, or crunching
shoes from below) aids in the localization process.
Visual cues heavily influence the localization of
sound sources.
The main questions with respect to audio-visual
perception are: At what level of perceptual pro-
cessing do cross-modal interactions occur? And
what mechanism underlies them?
Joint Processing of Audio-
Visual stimuli
As early as 1909, Brodmann suggested a division
of the cerebral cortex into 52 distinct regions, based
on their histological characteristics (Brodmann,
1909). These areas, today called Brodmann areas ,
have later been associated to nervous functions.
The most important areas in the audio-visual
game context are Primary Visual Cortex (V1),
Visual Association Cortex (V2 and V3), as well
as Primary Auditory Cortex and anterior and
posterior transverse temporal areas (H). This
division suggests that the different modalities are
related to separate regions of the brain, and that
processing of stimuli is performed separately for
each modality.
Taking a closer look at the brain reveals that
the neurons of the neocortex are arranged in six
horizontal layers, parallel to the surface. The
functional units of cortical activity are organized
in groups of neurons. These are connected by four
types of fibers, of which the association fibers are
especially interesting when looking at information
exchange between cortical areas. Short association
fibers (called loops) connect adjacent gyri, whereas
long association fibers form bundles to connect
more distant gyri in the same hemisphere. These
association bundles give fibers to and receive fibers
from the overlying gyri along their routes. They
occupy most of the space underneath the cortex.
There are many such connections between
different functional areas of the neocortex such
that information can be exchanged between
them and true multi-modal processing can be
achieved. Goldstein (2002) gives an example of
a red, rolling ball entering our field of view. Lo-
cally distinct neurons are then activated by either
motion, shape, or color. Subsequently, dorsal and
crOss-MODAL INtErActION
bEtWEEN AUDIO AND VIDEO
Human perception in real world situations is a
multi-modal, recursive process. Stimuli from
different modalities usually complement each
other and make the perceptual process more un-
equivocal. Only those stimuli that can actually
be perceived by the primary receptors of sound,
light, pressure and so on contribute to an overall
impression (which is the result of any perceptual
process). The human perceptual process, because
of its complexity, cannot easily be explained in
a simple block diagram without neglecting im-
portant features. A number of descriptive models
exist, but these only cover certain aspects of the
process, depending on the level of abstraction at
which the respective model is located.
Relatively little is known about the mechanisms
of multi-modal processing in the human brain.
Search WWH ::




Custom Search