Digital Signal Processing Reference
In-Depth Information
6
Pitch Estimation and
Voiced-Unvoiced
Classification of Speech
6.1 Introduction
Low bit-rate speech coders, traditionally called vocoders, rely heavily on
extracting the correct speech parameters from a given speech segment. The
three main speech features are the spectral envelope, the pitch and the
voiced - unvoiced classification. The spectral envelope is usually extracted by
a standard autocorrelation method which results in a linear predictive (LP)
parameters representation. However extracting the correct pitch and voicing
classification is not as straightforward and may require a combination of
methods.
When measuring the pitch, it is assumed that the voiced signals are formed
by passing quasi-periodic excitation signals through the LPC filter. The
duration between the pulses in the excitation signal is called the pitch period
T 0 or fundamental frequency f 0 . Correct estimation of the pitch is essential
for good quality speech-coding. Incorrect estimation of the pitch period
can seriously degrade the quality of synthesized speech. Pitch determination
algorithms (PDAs) have been studied in both the time and frequency domains,
and a comparison is discussed in [1, 2]. Traditionally, autocorrelation-based
methods [3] and their variants [4, 5] have been intensively investigated
and widely applied to various speech coders [6 - 11]. Frequency domain
approaches [12 - 14], on the other hand, have become popular recently due
to the growing interest in sinusoidal speech coders, such as the multi-band
excitation (MBE) [13] and the sinusoidal transform coder (STC) [14], which
conduct pitch determination based on a spectral synthesis (SS) method.
Search WWH ::




Custom Search