Robotics Reference
In-Depth Information
In one way the Voder was actually superior to any human being. Or-
dinarily it spoke in a firmly masculine baritone voice, but it could also
speak in tones ranging from the lowest bass to the highest soprano voice.
But the Voder was quite difficult to play. It took skilled telephone op-
erators six months to complete the course of 40 lessons devised to train
them on the use of the Voder, and a year to become proficient. Neverthe-
less, to this day the device has been a major influence on the science of
speech synthesis, and the staff at Bell Labs still refer to it with reverence.
Franklin Cooper's Pattern Playback
A speech synthesis system of quite a different kind was completed in
1950 by Franklin Cooper at Haskins Laboratories in Connecticut, a Yale-
affiliated institute for speech research. The device created the speech
from a sound spectrogram which, like a musical score, is a visual repre-
sentation of the sounds.
The horizontal axis on the spectrogram ( Figure 9 ) corresponds to
time (the further to the right you look along the time axis, the later it
is). The vertical axis corresponds to frequency (or pitch), with the higher
pitched sounds appearing higher on the diagram. The dark patches on
the spectrogram indicate relatively intense sounds. A spectrogram pro-
vides precise information about the sound because it is based on accurate
measurements of the changing frequency content of a sound over the
relevant period of time.
Pictures of sound are useful for describing and transforming sounds.
Audio researchers often use visual representations of sound to gain a bet-
ter understanding of the components of the sound and to transform the
sound in some way, for example by adding an echo effect or changing the
pitch. Cooper realized that visual representations of sound can be turned
back into the original sounds themselves, a process of inversion that he
called Pattern Playback.
In general this inverse process is not straightforward. The task is
essentially one of finding a sound waveform, a representation of the
sound wave's amplitude during a certain period of time, that comes clos-
est to generating the original picture (the spectrogram or “voiceprint”).
Cooper's device converted the spectrograms into sound, using either pho-
tographic copies of actual spectrograms or synthetic patterns that had
been painted by hand on a cellulose acetate base (rather like the material
in a film negative). The speech sounds were recreated by passing light
Search WWH ::




Custom Search