Pitch Estimation and Voiced–Unvoiced Classification of Speech - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

However, modern DSP techniques make the computational complexity of

frequency-domain PDAs insignificant, making them very popular in sinu-

soidal coders. In the following, we briefly explain two frequency-domain

PDAs.

Harmonic Peak Detection

An obvious way of determining the pitch in the frequency domain would be to

extract the spectral peak at the fundamental frequency. This requires the first

harmonic to be present, which cannot, in general, be expected because of the

front-end filtering. A more practical method is to detect all of the harmonic

peaks and then measure the fundamental frequency (pitch frequency) as

either the common divisor of these harmonics or the spacing of the adjacent

harmonics. This can be done using a comb filter given by

W(kω 0 )

1 , 2 , ... ω 0

;

ω

=

kω 0 ,k

=

C(ω, ω 0 )

(6.11)

0

;

otherwise

and correlating it with the speech spectrum. The output of the correlation,

A c (ω 0 ) , is the summation of weighted comb peaks as,

m /ω 0

ω 0

m

2 π

τ max ≤

2 π

τ min

A c (ω 0 )

=

S(kω 0 )W(kω 0 )

ω 0

≤

(6.12)

k

=

1

where m is the maximum frequency considered in the speech spectrum.

If ω 0 is equal to the fundamental frequency, the comb response will match

the harmonic peaks, and the maximum output will be obtained as shown in

Figure 6.4. In order to obtain better subjective quality, a weighting coefficient

can be applied to the individual teeth, normally decreasing weights with

increasing frequency [16].

Spectrum Similarity

This method assumes that the spectrum is fully voiced and is composed only

of a number of harmonics each located at multiples of the pitch frequency. A

synthetic spectrum is reconstructed using this assumption for each possible

pitch frequency candidate and is compared to the original spectrum. The

pitch frequency leading to the best matching reconstructed spectrum is then

selected [13] as the fundamental or pitch frequency. The speech spectrum is

assumed to be composed of voiced harmonics only, located at multiples of

the candidate pitch frequency ω 0 . Therefore the synthetic spectrum S(m, ω 0 )

is an approximation of the convolution of pulses located at multiples of the

candidate pitch frequency ω 0 ,bythespectrum W of the window used on

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home