Multimode Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

where N , the length of the analysis frames, is 160 and e h is an autoregressive

energy term given by,

=

+

e h

0 . 9 e h

0 . 1 e

if 8 e > e h

(9.50)

The condition 8 e > e h ensures that e h is updated only when the speech energy

is sufficiently high and e h should be initialized to approximately the mean

squared energy of voiced speech. Figure 9.19a illustrates the tracked energy

over a segment of speech. The low-band to high-band energy ratio, γ ω ,is

estimated as follows:

1/4

S 2 ω

ω s

d ω

ω s

0

γ ω =

(9.51)

S 2 ω

ω s

d ω

ω s

1/2

1 / 4

where ω s is the sampling frequency and S(ω) is the speech spectrum. The

speech spectrum is estimated using a 512-point FFT, after windowing 240

speech samples with a Kaiser window of β =

6 . 0. Figure 9.19b illustrates the

low-band to high-band energy ratio over a segment of speech, where the

speech signal is shifted down for clarity.

The zero-crossing rate is defined as the number of times the signal changes

sign, divided by the number of samples used in the observation. Figure 9.20a

illustrates the zero-crossing rate over a segment of speech, where the speech

1

200

100

0.5

0

− 100

s(n)

−

0.5

−

200

0

1000

2000

3000

0

1000

2000

3000

4000

samples

(a) Tracked energy, t e

(b) Low-band to high-band energy ratio, γ ω

Figure 9.19 Voicing metrics of the initial classification

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home