Harmonic Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

Frequ encies

Synthetic speech

( n )

θ

Cubic phase

interpolation

Sine wave

generator

Sum all

sine waves

Phases

Linear

interpolation

A ( n )

Amplitudes

Figure 8.2 Sinusoidal synthesis with matched frequency tracks

8.3 Parameter Estimation

Low bit-rate sinusoidal coders estimate the amplitudes at the harmonics of

the fundamental frequency. At low bit-rates, the harmonic phases are not

transmitted. Instead the phases are deduced from the spectral envelope on the

assumption that it is the gain response of a minimum phase transfer function

and added to the integrals of the component frequencies. STC implements

the harmonic phases explicitly and LPC-based coders implement the phases

implicitly through the time-domain LPC synthesis filter. Improved multi-

band excitation (IMBE) coders do not use any kind of phase information

and the phases are evolved as the integrals of the component harmonic

frequencies. Restricting the component frequencies to the harmonics and

modelling the phases at the decoder is well suited for stationary voiced

segments of speech. However, in general, the speech signal is not stationary

voiced and consists of a mixture of voiced and unvoiced segments. When

those segments are synthesized with the phase models described above,

the synthesized speech sounds buzzy. In order to remove this 'buzzyness'

the concept of frequency-domain voicing was introduced into low bit-rate

harmonic coders [5]. Frequency-domain voicing allows the synthesis of mixed

voiced signals, by separating the speech spectrum into frequency bands

marked as either voiced or unvoiced.

Frequency-domain voicing decisions are usually made for each harmonic

of the speech spectrum. Therefore, an accurate pitch estimate is a prerequisite

of harmonic amplitude and voicing determination. The frequency-domain

voicing determination techniques based on spectral matching need a high

precision pitch estimate for good performance. A small error in the pitch

will cause large deviations at the high frequency harmonics, and subse-

quent declaration of them as unvoiced. Furthermore, female voices with

short pitch periods are more sensitive to small pitch error. In order to

reduce the complexity of a high-precision pitch estimation, an initial pitch

estimate is usually further refined by performing a limited search around

the initial estimate. Having determined an accurate pitch the harmonic

coding usually proceeds with voicing and spectral amplitude estimation

processes.

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home