Databases Reference
In-Depth Information
3
2
1
0
1
2
−3
800
900
1000 1100 1200 1300 1400 1500 1600
F I GU R E 18 . 2
The sound /e/ in
test
.
18.3.1 The Channel Vocoder
In the channel vocoder [ 227 ], each segment of input speech is analyzed using a bank of band-
pass filters called the analysis filters . The energy at the output of each filter is estimated at
fixed intervals and transmitted to the receiver. In a digital implementation, the energy estimate
may be the average squared value of the filter output. In analog implementations, this is the
sampled output of an envelope detector. Generally, an estimate is generated 50 times every
second. Along with the estimate of the filter output, a decision is made as to whether the
speech in that segment is voiced, as in the case of the sounds /a/ /e/ /o/ , or unvoiced, as in the
case for the sounds /s/ /f/ . Voiced sounds tend to have a pseudoperiodic structure, as seen in
Figure 18.2 , which is a plot of the /e/ part of a male voice saying the word test . The period of
the fundamental harmonic is called the pitch period. The transmitter also forms an estimate of
the pitch period, which is transmitted to the receiver.
Unvoiced sounds tend to have a noiselike structure, as seen in Figure 18.3 , which is the /s/
sound in the word test .
At the receiver, the vocal tract filter is implemented by a bank of band-pass filters. The
bank of filters at the receiver, known as the synthesis filters , is identical to the bank of analysis
filters. Based on whether the speech segment is deemed to be voiced or unvoiced, either a
pseudonoise source or a periodic pulse generator is used as the input to the synthesis filter bank.
The period of the pulse input is determined by the pitch estimate obtained for the segment being
synthesized at the transmitter. The input is scaled by the energy estimate at the output of the
analysis filters. A block diagram of the synthesis portion of the channel vocoder is shown in
Figure 18.4 .
Since the introduction of the channel vocoder, a number of variations have been developed.
The channel vocoder matches the frequency profile of the input speech. There is no attempt
to reproduce the speech samples per se. However, not all frequency components of speech are
equally important. In fact, as the vocal tract is a tube of nonuniform cross section, it resonates
at a number of different frequencies. These frequencies are known as formants [ 119 ]. The
formant values change with different sounds; however, we can identify ranges in which they
 
Search WWH ::




Custom Search