Game Development Reference
In-Depth Information
(Kalberer et al., 2001; Kalberer et al., 2002a) and by Kshirsagar (2001), but for
fewer points on the face. Moreover, their Viseme Spaces where based on PCA
(Principal Component Analysis), not ICA. A justification for using ICA rather
than PCA is to follow later.
Straightforward point-to-point navigation as a way of concatenating visemes
would yield jerky motions. Moreover, when generating the temporal samples,
these may not precisely coincide with the pace at which visemes change. Both
problems are solved by fitting splines to the Viseme Space coordinates of the
visemes. This yields smoother changes and allows us to interpolate in order to
get the facial expressions needed at the fixed times of subsequent frames. We
used NURBS curves of order three.
A word on the implementation of co-articulation effects is in order here. A
distinction is made between vocals and labial consonants on the one hand, and
the remainder of the visemes on the other. The former impose their deformations
much more strictly onto the animation than the latter, which can be pronounced
with a lot of visual variation. In terms of the spline fitting, this means that the
animation trajectory will move precisely through the former visemes and will only
be attracted towards the latter. Figure 13 illustrates this for one Viseme Space
coordinate.
Initially a spline is fitted through the values of the corresponding component for
the visemes of the former category. Then, its course is modified by bending it
towards the coordinate values of the visemes in the latter category. This second
category is subdivided into three subcategories: (1) somewhat labial consonants
like those corresponding to the /ch,jh,sh,zh/ viseme pull stronger than (2) the
viseme /f,v/ , which in turn pulls stronger than (3) the remaining visemes of the
second category. In all three cases the same influence is given to the rounded
and widened versions of these visemes. The distance between the current spline
(determined by vocals and labial consonants) and its position if it had to go
through these visemes is reduced to (1) 20%, (2) 40%, and (3) 70%, respectively.
These are also shown in Figure 13. These percentages have been set by
comparing animations against 3D ground-truth. If an example face is animated
with the same audio track used for training, such comparison can be easily made
and deviations could be minimized by optimizing these parameters. Only dis-
tances between lip positions were taken account of so far.
Modifications by the Animator
A tool that automatically generates a face animation which the animator then has
to take or leave is a source of frustration, rather than a help. The computer cannot
replace the creative component that the human expert brings to the animation
Search WWH ::




Custom Search