Graphics Reference
In-Depth Information
manifested as variations of the basic sound. These variations of a phoneme are called allophones and
the mechanism producing the variations is referred to as coarticulation . While the precise mechanism
of coarticulation is a subject for debate, the basic reason for it is that the ideal sounds are slurred
together as phonemes are strung one after the other. Similarly, the visemes of visual speech are mod-
ified by the context of the speech in which they occur. The speed at which speech is produced and the
physical limitations of the speech articulators result in an inability to attain ideal pronunciation. This
slurs the visemes together and, in turn, slurs the phonemes. The computation of this slurring is the sub-
ject of research.
10.4.3 Coarticulation
One of the complicating factors in automatically producing realistic (both audial and visual) lip-sync
animation is the effect that one phoneme has on adjacent phonemes. Adjacent phonemes affect the
motion of the speech articulators as they form the sound for a phoneme. The resulting subtle change
in sound of the phoneme produces what is referred to as an allophone of the phoneme. This effect is
known as coarticulation . Lack of coarticulation is one of the main reasons that lip-sync animation using
blend shapes appears unrealistic. While there have been various strategies proposed in the literature to
compute the effects of coarticulation, Cohen and Massaro [ 6 ] have used weighting functions, called
dominance functions , to perform a priority-based blend of adjacent phonemes. King and Parent [ 18 ]
have modified and extended the idea to animation song. Pelachaud et al. [ 26 ] cluster phonemes based
on deformability and use a look-ahead procedure that applies forward and backward coarticulation
rules. Other approaches include the use of constraints [ 12 ] , physics [ 2 ] [ 30 ] , rules [ 5 ] , and syllables
[ 19 ]. None have proven to be a completely satisfying solution for automatically producing realistic
audiovisual speech.
10.4.4 Prosody
Another complicating factor to realistic lip-sync animation is changing neutral speech to reflect emo-
tional stress. Such stress is referred to as prosody . Affects of prosody include changing the duration,
pitch, and amplitude of words or phrases of an utterance. This is an active area of research (e.g., [ 3 ][ 4 ]
[ 5 ][ 11 ] [ 20 ]).
10.5 Chapter summary
Facial animation presents interesting challenges. As opposed to most other areas of computer anima-
tion, the foundational science is largely incomplete. This, coupled with the complexity inherent in the
facial structure of muscles, bones, fatty tissue, and other anatomic elements, makes facial animation
one area that has not been conquered by the computer.
Search WWH ::




Custom Search