Digital Signal Processing Reference
In-Depth Information
FIGURE 10.52 Speech synthesis algorithm with various modules.
2. Windowing . The speech waveform is decomposed into smaller frames using
the Hamming window. This suppresses the side lobes in the frequency domain.
3. Levinson-Durbin algorithm . To calculate the LPC coefficients, the autocorre-
lation matrix of the speech frame is required. From this matrix, the LPC co-
efficients can be obtained using
p
 1
() =
(
)
ri
a r i
-
k
k
k
where r ( i ) and a k represent the autocorrelation array and the coefficients,
respectively.
4. Residual signal . For synthesis of the artificial voice, the excitation is given by
the residual signal, which is obtained by passing the input speech frame
through an FIR filter. It serves as an excitation signal for both voiced and
unvoiced signals. This limits the algorithm due to the energy and frequency
calculations required for making decisions about voiced/unvoiced excitation
since, even for an unvoiced excitation that has a random signal as its
source, the same principle of residue signal can still be used. This is because,
in the case of unvoiced excitation, even the residue signal obtained will be
random.
5. Speech synthesis . With the representation of the speech frame in the form of
the LPC filter coefficients and the excitation signal, speech can be synthesized.
This is done by passing the excitation signal (the residual signal) through an
IIR filter. The residual signal generation and the speech synthesis modules
imitate the vocal chord and the vocal tract of the speech production system
in humans.
6. Accumulation and buffering . Since speech is segmented at the beginning, the
synthesized voice needs to be concatenated. This is performed by the accu-
mulation and buffering module.
7. Output . When the entire synthesized speech segment is obtained, it is played.
During playback, the data are down-sampled to 4 kHz to restore the intelligi-
bility of the speech.
Search WWH ::




Custom Search