Pitch Estimation and Voiced–Unvoiced Classification of Speech - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

The whole search, therefore, is divided into two procedures: subrange search

and subrange comparison. Since these subranges are all fully overlapped,

searching over the subranges only need be done twice, from left to right and

from right to left. We start with the range R 1 and compare its minimum with

the nonoverlapped part of R 2 and so on until all of the right hand side is

completed. The same procedure is applied to the left hand side starting with

R 4. Finally, the left hand and right hand side minima are compared and the

overall minimum is selected. We can also see that the number of comparisons

during the search is independent of the size of the pitch search ranges and is

equal to three times the number of pitch candidates.

Multiple Pitch and Half Pitch Errors

Almost all PDAs have a peak detector which decides the pitch by the peak

position. In time-domain methods for example, the peak to be detected is

not only positioned at the correct pitch lag, but also at its integer multiples.

Therefore it is possible that a multiple of the real pitch may be chosen. In

order to find the desired peak among the peaks, a complicated procedure

is normally needed. The basic idea for solving this problem includes two

steps: picking the maximum peak; checking the submultiple positions to see

if there is a comparable peak. However, since there is no fixed solution to this

problem, tuned comparison thresholds are generally used.

For example, in the case of the cross-correlation pitch estimation method,

the comparison is made by looking at the ratio R(τ 0 /i)/R(τ 0 ) where i is

an integer, which produces pitch submultiples greater than or equal to the

minimum expected pitch. The smallest submultiple which may produce a

ratio greater than the set threshold is selected as the pitch.

In frequency-domain methods, such as the spectrum similarity method,

a similar procedure can be applied. In this case, the average sum of the

harmonics in the signal may be used in the comparison. At every submultiple,

the average sum of harmonic magnitudes are computed by

L k

1

L k

A v (ω k ) =

A(iω k )

;

k

=

1 , 2 , 3 , ... ,n.

(6.41)

i

=

1

where L k is the total number of harmonics in a 4 kHz speech bandwidth,

A(iω k ) are harmonic magnitudes and ω k

2 π

τ 0 /k is the fundamental frequency

of the k th submultiple of the initial pitch. The ratio between the A v (ω k ) of the

smallest submultiple and the initial pitch, τ 0 , is then computed and compared

with a threshold which may vary for each submultiple. If this ratio is bigger

than the corresponding threshold, then the smallest submultiple is selected as

the pitch estimate. Otherwise the next largest submultiple is checked against

=

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home