Biomedical Engineering Reference
In-Depth Information
FIGURE 7-19
Conversion of a
10 × 11 gray-scale
image to sound.
[Adapted from
(Meijer 2006).]
sound patterns. From a theoretical perspective, this means that the available bandwidth is
not exploited optimally. However, by keeping the mapping as direct and simple as possible
the risk of accidentally filtering out important clues is reduced.
The principles used for spatial encoding of the visual signal to a sound sequence are
shown graphically in Figure 7-19 for a gray-scale image. Image processing has reduced
the spatial resolution to only 110 pixels (10
×
11) and the intensity to only three gray-
scale levels ( M = 10 , N = 11 , G = 3). The mapping translates the vertical position of
each pixel into a frequency proportional to its position in the column with the amplitude
proportional to the pixel brightness. The horizontal position maps to a delay after the start
time (denoted by a click) or to an amplitude difference in a stereo system.
As the complete audio signal from each column is output simultaneously, and there
are N columns, the sound is available for T
N seconds for a total scene update rate of
T seconds. For each column, j , every pixel is used to excite an associated sinusoidal
oscillator within the audible band, with the lowest frequencies at the bottom and the
highest at the top. In addition, the oscillators form an orthogonal basis in which they are
all integer multiples of some reference frequency. This ensures that all of the information
is preserved in this transformation from geometrical to Hilbert space (Meijer, 2006).
The M oscillator signals from column j are superimposed with the corresponding
sound pattern presented to the ear for T
/
1
is output. This continues until the N -th column has been converted after T seconds, when
the whole cycle is repeated for a new image. Images are separated by a synchronization
click, which is essential to ensure that the subject can reorientate laterally. In addition, the
relative amplitude of the sounds differs in the two ears to produce a stereo effect, so that
the sound appears to come from the correct direction.
As shown in Chapter 5, the spectrum of a sinusoidal signal with duration T
/
N seconds before the sound from column j
+
N seconds
includes other frequency components. These can be determined by convolving the Fourier
transform of the rectangular window with that of the sinusoidal signal. One important
consideration is that even the lowest frequency is represented by a reasonable number
of complete cycles during the observation time; otherwise, it would not be interpreted
correctly by the ear.
This processing strategy has been implemented in a sensory substitution prosthesis
called vOICe, with which, according to Meijer, it is possible to learn to sense instinctively
how the features of a soundscape correspond to objects in the physical world. Pat Fletcher,
a proficient user of the vOICe who could see until age 21, describes the sound images
/
Search WWH ::




Custom Search