Pitch Estimation and Voiced–Unvoiced Classification of Speech - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

By substituting the optimum gain back into the error function of equation

(6.3), the pitch can be estimated by minimizing

N − 1

τ)

2

s(n)s(n

+

N

−

1

n

=

0

s 2 (n)

E(τ , β)

=

−

(6.8)

N

−

1

n

=

0

s 2 (n

+

τ)

n

=

0

This is equivalent to maximizing the second term on the right hand side,

N − 1

τ)

2

s(n)s(n

+

n

=

0

R n (τ )

=

(6.9)

N

−

1

s 2 (n

+

τ)

n

=

0

Direct use of the above equation may result in some errors. This is because

the square of the autocorrelation may result in a maximum even if the

correlation is negative, forcing possible pitch-halving errors. In order to

eliminate this problem, the square root of equation (6.9) is taken to remove

the square from the correlation and, hence, eliminate the possibility of

lags with negative correlation from being selected as the pitch. The final

normalized autocorrelation function is therefore given by,

−

N

1

s(n)s(n

+

τ)

n

=

0

=

R n (τ )

(6.10)

N

−

1

s 2 (n

+ τ)

n

=

0

The normalized autocorrelation function, shown in Figure 6.3c, shows much

better performance than the direct (un-normalized) autocorrelation method.

6.2.2 Frequency-DomainPDAs

Although most waveform similarity methods have their frequency domain

equivalents, the frequency domain PDAs directly operate on the speech

spectrum. The main frequency domain feature of a periodic signal is the har-

monic structure, with the distance between harmonics being the fundamental

frequency or the frequency equivalent of the pitch period. The main draw-

back of frequency-domain methods is their high computational complexity.

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home