Game Development Reference
In-Depth Information
For the 3D shape extraction of the talking face, we have used a 3D acquisition
system that uses structured light (Eyetronics, 1999). It projects a grid onto the
face, and extracts the 3D shape and texture from a single image. By using a video
camera, a quick succession of 3D snapshots can be gathered. We are especially
interested in frames that represent the different visemes. These are the frames
where the lips reach their extremal positions for that sound (Ezzat and Poggio
(Ezzat et al., 2000) followed the same approach in 2D). The acquisition system
yields the 3D coordinates of several thousand points for every frame. The output
is a triangulated, textured surface. The problem is that the 3D points correspond
to projected grid intersections, not corresponding, physical points of the face.
Hence, the points for which 3D coordinates are given change from frame to
frame. The next steps have to solve for the physical correspondences.
Fitting of the Generic Head Model
Our animation approach assumes a specific topology for the face mesh. This is
a triangulated surface with 2'268 vertices for the skin, supplemented with
separate meshes for the eyes, teeth, and tongue (another 8'848, mainly for the
teeth). Figure 3 shows the generic head and its topology.
The first step in this fitting procedure deforms the generic head by a simple
rotation, translation, and anisotropic scaling operation, to crudely align it with the
neutral shape of the example face. This transformation minimizes the average
distance between a number of special points on the example face and the model
Figure 3. The generic head model that is fitted to the scanned 3D data of
the example face. Left: Shaded version; Right: Underlying mesh.
Search WWH ::




Custom Search