Digital Signal Processing Reference
In-Depth Information
Advances in the understanding of speech production mechanism in humans,
coupled with similar advances in DSP, have had an impact on speech synthesis
techniques. Perhaps the most singular factors that started a new era in this field
were the computer processing and storage technologies. While speech and language
were already important parts of daily life before the invention of the computer, the
equipment and technology that developed over the last several years have made
it possible to produce machines that speak, read, and even carry out dialogs. A
number of vendors provide both recognition and speech technology. Some of the
latest applications of speech synthesis are in cellular phones, security networks, and
robotics.
There are different methods of speech synthesis based on the source. In a text-
to-speech system, the source is a text string of characters read by the program to
generate voice. Another approach is to associate intelligence in the program so that
it can generate voice without external excitation. One of the earliest techniques was
Formant synthesis . This method was limited in its ability to represent voice with high
fidelity due to its inherent drawback of representing phonemes by three frequen-
cies. This method, and several analog technologies that followed, were replaced by
digital methods. Some early digital technologies were RELP (residue excited) and
VELP (voice excited). These were replaced by new technologies, such as LPC
(linear predictive coding), CELP (code excited), and PSOLA (pitch synchronous
overlap-add). These technologies have been extensively used to generate artificial
voice.
Linear Predictive Coding
Most methods that are used for analyzing speech start by transforming acoustic data
into spectral form by performing short time Fourier analysis of the speech wave.
Although this type of spectral analysis is a well-known technique for studying
signals, its application to speech signal suffers from limitations due to the nonsta-
tionary and quasi-periodic properties of the speech wave. As a result, methods based
on spectral analysis often do not provide a sufficiently accurate description of
speech articulation. Linear predictive coding (LPC) represents the speech wave-
form directly in terms of time-varying parameters related to the transfer function
of the vocal tract and the characteristics of the source function. It uses the knowl-
edge that any speech can be represented by certain types of parametric informa-
tion, including the filter coefficients (that model the vocal tract) and the excitation
signal (that maps the source signals). The implementation of LPC reduces to the
calculation of the filter coefficients and excitation signals, making it suitable for
digital implementation.
Speech sounds are produced as a result of acoustical excitation of the human
vocal tract. During production of the voiced sounds, the vocal chord is excited by a
series of nearly periodic pulses generated by the vocal cords. In unvoiced sounds,
excitation is provided by the air passing turbulently through constrictions in the
tract. A simple model of the vocal tract is a discrete time-varying linear filter.
Search WWH ::




Custom Search