Databases Reference
In-Depth Information
8.6.3 Physical Models
Physical models are based on the physics of the source output production. The physics are
generally complicated and not amenable to a reasonable mathematical approximation. An
exception to this rule is speech generation.
Speech Production
There has been a significant amount of research conducted in the area of speech production
[ 116 ], and volumes have been written about it. We will try to summarize some of the pertinent
aspects in this section.
Speech is produced by forcing air through an elastic opening, the vocal cords, and then
through cylindrical tubes with nonuniform diameter (the laryngeal, oral, nasal, and pharynx
passages), and finally through cavities with changing boundaries, such as the mouth and the
nasal cavity. Everything past the vocal cords is generally referred to as the vocal tract .The
first action generates the sound, which is then modulated into speech as it traverses through
the vocal tract.
We will often be talking about filters in the coming chapters. We will try to describe filters
more precisely at that time. For our purposes at present, a filter is a system that has an input
and an output and a rule for converting the input to the output, which we will call the transfer
function . If we think of speech as the output of a filter, the sound generated by the air rushing
past the vocal cords can be viewed as the input, while the rule for converting the input to the
output is governed by the shape and physics of the vocal tract.
The output depends on the input and the transfer function. Let's look at each in turn.
There are several different forms of input that can be generated by different conformations
of the vocal cords and the associated cartilages. If the vocal cords are stretched shut and we
force air through, the vocal cords vibrate, providing a periodic input. If a small aperture is left
open, the input resembles white noise. By opening an aperture at different locations along the
vocal cords, we can produce a white noise-like input with certain dominant frequencies that
depend on the location of the opening. The vocal tract can be modeled as a series of tubes
of unequal diameter. If we now examine how an acoustic wave travels through this series of
tubes, we find that the mathematical model that best describes this process is an autoregressive
model. We will often encounter the autoregressive model when we discuss speech compression
algorithms.
8.7 Summary
In this chapter, we have looked at a variety of topics that will be useful to us when we study
various lossy compression techniques, including distortion and its measurement, some new
concepts from information theory, average mutual information and its connection to the rate
of a compression scheme, and the rate distortion function. We have also briefly looked at
some of the properties of the human visual and auditory systems—most importantly, visual
and auditory masking. The masking phenomena allow us to incur distortion in such a way that
Search WWH ::




Custom Search