Graphics Reference
In-Depth Information
The fundamental frequency of a sound is referred to as F0 and is present in voiced sounds [ 21 ].
Frequencies induced by the speech cavities, called formants , are referred to as F1, F2,
in order
of their amplitude. The fundamental frequency and formants are arguably the most important concepts
in processing speech.
While most of this activity is interior and therefore not directly observable, it can produce motion in
the skin that may be important in some animation. Certainly it is important to correctly animate the lips
and the surrounding area of the face. Animating the tongue is also usually of some importance.
Some information can be gleaned from a time-amplitude graph of the sound, but more informative
is a time-frequency graph with amplitude encoded using color. Using these spectrographs , trained
professionals can determine the basic sounds of speech.
...
10.4.2 Phonemes
In trying to understand speech and how a person produces it, a common approach is to break it down
into a simple set of constituent, atomic sound segments. The most commonly used segments are called
phonemes . Although the specific number of phonemes varies from source to source, there are generally
considered to be around 42 phonemes.
The corresponding facial poses that produce these sounds are referred to as visemes . Visemes that
are similar enough can be combined into a single unique viseme and the resulting set of facial poses can
be used ( Figure 10.22 ) , for example, as blend shapes for a simple type of lip-sync animation.
However, the sounds and associated lip movements are much more complex than can be repre-
sented by simply interpolating between static poses. Within the context of speech, a phoneme is
Aah
B, M, P
Ch, J, Sh
D, T
Dh, Th
Eh
F, V
K, E, I
R
Oh
Ooh, Q
S
FIGURE 10.22
Viseme set [ 30 ] .
 
Search WWH ::




Custom Search