Pitch Estimation and Voiced–Unvoiced Classification of Speech - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

MBE Mixed Voicing

The voicing decision is made by examining the normalized distance D k

between the original and estimated speech spectra in frequency bands,

b k

− S(m, ω 0 )

2

a k |

S(m)

|

m

=

D k

=

(6.59)

b k

2

a k |

|

S(m)

m

=

where ω 0 is the refined fundamental frequency, a k and b k are the first and last

harmonic in the k th band, S(m) is the original speech spectrum, and

S(m, ω 0 )

is the reconstructed speech spectrum which is calculated using:

S(m, ω 0 )

=

A l (ω 0 )W(m)

1

≤

l

≤

L,

a l ≤

m <

b l

(6.60)

means the nearest integer greater

than or equal to, L is the number of harmonics within the 4 kHz speech

bandwidth, W(m) is the frequency response of a suitable window centred at

the l th harmonic of the fundamental frequency (see Figure 6.31) and A l (ω 0 ) is

where a l

=

(l

−

0 . 5 )ω 0 , b l

=

(l

+

0 . 5 )ω 0 ,

.

40.0

20.0

0.0

−

20.0

−

40.0

−

60.0

−

4.0

−

2.0

0.0

2.0

4.0

Frequency (kHz)

Figure 6.31 Frequency response of the Hamming window

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home