Facial Animation - Computer Animation: Algorithms and Techniques

Graphics Reference

In-Depth Information

manifested as variations of the basic sound. These variations of a phoneme are called allophones and

the mechanism producing the variations is referred to as coarticulation . While the precise mechanism

of coarticulation is a subject for debate, the basic reason for it is that the ideal sounds are slurred

together as phonemes are strung one after the other. Similarly, the visemes of visual speech are mod-

ified by the context of the speech in which they occur. The speed at which speech is produced and the

physical limitations of the speech articulators result in an inability to attain ideal pronunciation. This

slurs the visemes together and, in turn, slurs the phonemes. The computation of this slurring is the sub-

ject of research.

10.4.3 Coarticulation

One of the complicating factors in automatically producing realistic (both audial and visual) lip-sync

animation is the effect that one phoneme has on adjacent phonemes. Adjacent phonemes affect the

motion of the speech articulators as they form the sound for a phoneme. The resulting subtle change

in sound of the phoneme produces what is referred to as an allophone of the phoneme. This effect is

known as coarticulation . Lack of coarticulation is one of the main reasons that lip-sync animation using

blend shapes appears unrealistic. While there have been various strategies proposed in the literature to

compute the effects of coarticulation, Cohen and Massaro [ 6 ] have used weighting functions, called

dominance functions , to perform a priority-based blend of adjacent phonemes. King and Parent [ 18 ]

have modified and extended the idea to animation song. Pelachaud et al. [ 26 ] cluster phonemes based

on deformability and use a look-ahead procedure that applies forward and backward coarticulation

rules. Other approaches include the use of constraints [ 12 ] , physics [ 2 ] [ 30 ] , rules [ 5 ] , and syllables

[ 19 ]. None have proven to be a completely satisfying solution for automatically producing realistic

audiovisual speech.

10.4.4 Prosody

Another complicating factor to realistic lip-sync animation is changing neutral speech to reflect emo-

tional stress. Such stress is referred to as prosody . Affects of prosody include changing the duration,

pitch, and amplitude of words or phrases of an utterance. This is an active area of research (e.g., [ 3 ][ 4 ]

[ 5 ][ 11 ] [ 20 ]).

10.5 Chapter summary

Facial animation presents interesting challenges. As opposed to most other areas of computer anima-

tion, the foundational science is largely incomplete. This, coupled with the complexity inherent in the

facial structure of muscles, bones, fatty tissue, and other anatomic elements, makes facial animation

one area that has not been conquered by the computer.

Search WWH ::

Custom Search

Home