Digital Signal Processing Reference
In-Depth Information
Frequ encies
Synthetic speech
( n )
θ
Cubic phase
interpolation
Sine wave
generator
Sum all
sine waves
Phases
Linear
interpolation
A ( n )
Amplitudes
Figure 8.2 Sinusoidal synthesis with matched frequency tracks
8.3 Parameter Estimation
Low bit-rate sinusoidal coders estimate the amplitudes at the harmonics of
the fundamental frequency. At low bit-rates, the harmonic phases are not
transmitted. Instead the phases are deduced from the spectral envelope on the
assumption that it is the gain response of a minimum phase transfer function
and added to the integrals of the component frequencies. STC implements
the harmonic phases explicitly and LPC-based coders implement the phases
implicitly through the time-domain LPC synthesis filter. Improved multi-
band excitation (IMBE) coders do not use any kind of phase information
and the phases are evolved as the integrals of the component harmonic
frequencies. Restricting the component frequencies to the harmonics and
modelling the phases at the decoder is well suited for stationary voiced
segments of speech. However, in general, the speech signal is not stationary
voiced and consists of a mixture of voiced and unvoiced segments. When
those segments are synthesized with the phase models described above,
the synthesized speech sounds buzzy. In order to remove this 'buzzyness'
the concept of frequency-domain voicing was introduced into low bit-rate
harmonic coders [5]. Frequency-domain voicing allows the synthesis of mixed
voiced signals, by separating the speech spectrum into frequency bands
marked as either voiced or unvoiced.
Frequency-domain voicing decisions are usually made for each harmonic
of the speech spectrum. Therefore, an accurate pitch estimate is a prerequisite
of harmonic amplitude and voicing determination. The frequency-domain
voicing determination techniques based on spectral matching need a high
precision pitch estimate for good performance. A small error in the pitch
will cause large deviations at the high frequency harmonics, and subse-
quent declaration of them as unvoiced. Furthermore, female voices with
short pitch periods are more sensitive to small pitch error. In order to
reduce the complexity of a high-precision pitch estimation, an initial pitch
estimate is usually further refined by performing a limited search around
the initial estimate. Having determined an accurate pitch the harmonic
coding usually proceeds with voicing and spectral amplitude estimation
processes.
 
Search WWH ::




Custom Search