Harmonic Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

envelope and the bit rate. The 4.8 kb/s STC uses a 14 th -order all-pole model

and quantizes the predictor coefficients in the LSF domain. In addition to the

LSFs, the STC transmits gain, pitch, and voicing.

8.4.2 ImprovedMulti-BandExcitation, INMARSAT-MVersion

Improved multi-band excitation (IMBE) operating at 4.15 kb/s for

INMARSAT-Mdivides the speech spectrum into several voiced and unvoiced

frequency bands, using the multi-band approach described in Section 8.3.1.

However, IMBE makes the voicing decisions for groups of three harmonics

and a single bit is allocated for each group. The total number of voicing bits

B v is limited to a maximum of 12 and the harmonics beyond the coverage of

voicing are declared unvoiced. The refined pitch is transmitted using eight

bits. The frame length is 20ms giving 83 bits per frame at 4.15 kb/s and the

remaining bits, i.e. 83

B v , are allocated for spectral amplitudes. The voiced

amplitudes are estimated using equation (8.11) and the unvoiced amplitudes

are estimated using equation (8.13). The voiced bands are synthesized as

follows:

−

8

−

s v (n) =

ˆ

A k cos (kφ 0 (n))

for n

=

0 , 1 , 2 , ... ,N

−

1

(8.21)

k

=

voiced

where N is the frame length and the fundamental phase evolution, φ 0 (n) ,is

defined by the following equations:

φ 0 (n)

=

φ 0 (n

−

1 )

+

ω 0 (n)

(8.22)

1

N (N

n) ω l − 1

nω l 0

ω 0 (n) =

−

+

(8.23)

0

1 ) of the previous frame and ω l 0 is the normalized

fundamental frequency estimated at the end of the l th frame. The amplitudes of

the voiced harmonics are linearly interpolated between the analysis points. If

the corresponding harmonic of one analysis point does not exist or is declared

unvoiced then its amplitude is set to zero and the harmonic frequency stays

constant (set to the frequency of the existing voiced harmonic). However

if the pitch estimate is not steady, neither the pitch nor the amplitudes are

interpolated for any harmonics; instead overlap and add method is used.

The unvoiced component is synthesized using filtered white Gaussian

noise. White noise is generated in the time domain and transformed into the

frequency domain; the bands corresponding to the voiced components are

set to zero; and the unvoiced bands are scaled according to the unvoiced

gain factors. The inverse Fourier transform of the modified spectrum gives

the unvoiced component,

where φ 0 (

−

1 ) is φ 0 (N

−

ˆ

s uv (n) , which is produced using the overlap and

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home