Digital Signal Processing Reference
In-Depth Information
can be approximated by using the integrals of the component frequencies.
Moreover, LPC models the large variation in the speech magnitude spectrum
and simplifies the harmonic amplitude quantization.
8.2 Sinusoidal Analysis and Synthesis
Figure 8.1 depicts block diagrams of the sinusoidal analysis and synthesis
processes introduced by McAulay. The speech spectrum is estimated by
windowing the input speech signal using a Hamming window and then
computing the Discrete Fourier Transform (DFT). The frequencies, ampli-
tudes, and phases corresponding to the peaks of the magnitude spectrum
become the model parameters of the sinusoidal representation. Employing a
pitch-adaptive analysis window length of two and a half times the average
pitch improves the accuracy of peak estimation. The synthesizer generates
the sine waves corresponding to the estimated frequencies and phases, and
modulates them using the amplitudes. Then all the sinusoids are summed
to produce the synthesized speech. The block edge effects are smoothed
out by applying overlap and add, using a triangular window. Overlap and
add is effectively a simple interpolation technique and, in sinusoidal synthe-
sis, it requires parameter update rates of at least every 10-15ms for good
quality speech synthesis. At lower frame rates the spectral peaks need to
be properly aligned between the analysis frames to form frequency tracks.
The amplitudes of the frequency tracks are linearly interpolated, and the
instantaneous phases are generated using a cubic polynomial [1] as shown in
Figure 8.2.
Amplitudes
Magnitude
spectrum
Peak
picking
Frequencies
Input speech
DFT
P hases
Phases of the
spectral peaks
Window
Sinusoidal speech analysis
Freq uencies
Synthetic speech
Sine wave
generator
Sum all
sine waves
Overlap and
add
Phases
Amplitudes
Sinusoidal speech synthesis
Figure 8.1 General sinusoidal analysis and synthesis
 
Search WWH ::




Custom Search