Mathematical Preliminaries for Lossy Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

8.6.3 Physical Models

Physical models are based on the physics of the source output production. The physics are

generally complicated and not amenable to a reasonable mathematical approximation. An

exception to this rule is speech generation.

Speech Production

There has been a significant amount of research conducted in the area of speech production

[ 116 ], and volumes have been written about it. We will try to summarize some of the pertinent

aspects in this section.

Speech is produced by forcing air through an elastic opening, the vocal cords, and then

through cylindrical tubes with nonuniform diameter (the laryngeal, oral, nasal, and pharynx

passages), and finally through cavities with changing boundaries, such as the mouth and the

nasal cavity. Everything past the vocal cords is generally referred to as the vocal tract .The

first action generates the sound, which is then modulated into speech as it traverses through

the vocal tract.

We will often be talking about filters in the coming chapters. We will try to describe filters

more precisely at that time. For our purposes at present, a filter is a system that has an input

and an output and a rule for converting the input to the output, which we will call the transfer

function . If we think of speech as the output of a filter, the sound generated by the air rushing

past the vocal cords can be viewed as the input, while the rule for converting the input to the

output is governed by the shape and physics of the vocal tract.

The output depends on the input and the transfer function. Let's look at each in turn.

There are several different forms of input that can be generated by different conformations

of the vocal cords and the associated cartilages. If the vocal cords are stretched shut and we

force air through, the vocal cords vibrate, providing a periodic input. If a small aperture is left

open, the input resembles white noise. By opening an aperture at different locations along the

vocal cords, we can produce a white noise-like input with certain dominant frequencies that

depend on the location of the opening. The vocal tract can be modeled as a series of tubes

of unequal diameter. If we now examine how an acoustic wave travels through this series of

tubes, we find that the mathematical model that best describes this process is an autoregressive

model. We will often encounter the autoregressive model when we discuss speech compression

algorithms.

8.7 Summary

In this chapter, we have looked at a variety of topics that will be useful to us when we study

various lossy compression techniques, including distortion and its measurement, some new

concepts from information theory, average mutual information and its connection to the rate

of a compression scheme, and the rate distortion function. We have also briefly looked at

some of the properties of the human visual and auditory systems—most importantly, visual

and auditory masking. The masking phenomena allow us to incur distortion in such a way that

Search WWH ::

Custom Search

Home