DSP Applications and Student Projects - Digital Signal Processing and Applications with the C6713 and C6416 DSK

Digital Signal Processing Reference

In-Depth Information

Advances in the understanding of speech production mechanism in humans,

coupled with similar advances in DSP, have had an impact on speech synthesis

techniques. Perhaps the most singular factors that started a new era in this field

were the computer processing and storage technologies. While speech and language

were already important parts of daily life before the invention of the computer, the

equipment and technology that developed over the last several years have made

it possible to produce machines that speak, read, and even carry out dialogs. A

number of vendors provide both recognition and speech technology. Some of the

latest applications of speech synthesis are in cellular phones, security networks, and

robotics.

There are different methods of speech synthesis based on the source. In a text-

to-speech system, the source is a text string of characters read by the program to

generate voice. Another approach is to associate intelligence in the program so that

it can generate voice without external excitation. One of the earliest techniques was

Formant synthesis . This method was limited in its ability to represent voice with high

fidelity due to its inherent drawback of representing phonemes by three frequen-

cies. This method, and several analog technologies that followed, were replaced by

digital methods. Some early digital technologies were RELP (residue excited) and

VELP (voice excited). These were replaced by new technologies, such as LPC

(linear predictive coding), CELP (code excited), and PSOLA (pitch synchronous

overlap-add). These technologies have been extensively used to generate artificial

voice.

Linear Predictive Coding

Most methods that are used for analyzing speech start by transforming acoustic data

into spectral form by performing short time Fourier analysis of the speech wave.

Although this type of spectral analysis is a well-known technique for studying

signals, its application to speech signal suffers from limitations due to the nonsta-

tionary and quasi-periodic properties of the speech wave. As a result, methods based

on spectral analysis often do not provide a sufficiently accurate description of

speech articulation. Linear predictive coding (LPC) represents the speech wave-

form directly in terms of time-varying parameters related to the transfer function

of the vocal tract and the characteristics of the source function. It uses the knowl-

edge that any speech can be represented by certain types of parametric informa-

tion, including the filter coefficients (that model the vocal tract) and the excitation

signal (that maps the source signals). The implementation of LPC reduces to the

calculation of the filter coefficients and excitation signals, making it suitable for

digital implementation.

Speech sounds are produced as a result of acoustical excitation of the human

vocal tract. During production of the voiced sounds, the vocal chord is excited by a

series of nearly periodic pulses generated by the vocal cords. In unvoiced sounds,

excitation is provided by the air passing turbulently through constrictions in the

tract. A simple model of the vocal tract is a discrete time-varying linear filter.

Search WWH ::

Custom Search

Home