DSP Applications and Student Projects - Digital Signal Processing and Applications with the C6713 and C6416 DSK - page 417

Digital Signal Processing Reference

In-Depth Information

FIGURE 10.52 Speech synthesis algorithm with various modules.

2. Windowing . The speech waveform is decomposed into smaller frames using

the Hamming window. This suppresses the side lobes in the frequency domain.

3. Levinson-Durbin algorithm . To calculate the LPC coefficients, the autocorre-

lation matrix of the speech frame is required. From this matrix, the LPC co-

efficients can be obtained using

p

Â 1

() =

(

)

ri

a r i

◊

-

k

k

k

where r ( i ) and a k represent the autocorrelation array and the coefficients,

respectively.

4. Residual signal . For synthesis of the artificial voice, the excitation is given by

the residual signal, which is obtained by passing the input speech frame

through an FIR filter. It serves as an excitation signal for both voiced and

unvoiced signals. This limits the algorithm due to the energy and frequency

calculations required for making decisions about voiced/unvoiced excitation

since, even for an unvoiced excitation that has a random signal as its

source, the same principle of residue signal can still be used. This is because,

in the case of unvoiced excitation, even the residue signal obtained will be

random.

5. Speech synthesis . With the representation of the speech frame in the form of

the LPC filter coefficients and the excitation signal, speech can be synthesized.

This is done by passing the excitation signal (the residual signal) through an

IIR filter. The residual signal generation and the speech synthesis modules

imitate the vocal chord and the vocal tract of the speech production system

in humans.

6. Accumulation and buffering . Since speech is segmented at the beginning, the

synthesized voice needs to be concatenated. This is performed by the accu-

mulation and buffering module.

7. Output . When the entire synthesized speech segment is obtained, it is played.

During playback, the data are down-sampled to 4 kHz to restore the intelligi-

bility of the speech.

Next Page

Digital Signal Processing and Applications with the C6713 and C6416 DSK

Search WWH ::

Custom Search

Home