Motion Capture - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

was very easy. But the actual solution for the final geometry might not have looked as

good as what you get from complex rigs that contain a lot of nonlinear functions like

skin sliding. Themore complex the animation rig, the better the final result looks, but

it makes the job of mapping the mocap data onto the rig harder.

Wedig: In terms of facial motion capture, for the most part we were never aiming

for a 100 percent solution of directly mapping mocap data onto a final animated

character. Our goal is to produce a tool that gets rid of the boring work and the things

that annoyed the animators on previous shows. The animator spends so long on a

shot, and probably eighty to ninety percent of that time is just getting to the point

where it's fun, where the animator is working on all the subtleties, emotional cues,

andmicro-expressions of the character. That'swherewewant our animators focused;

we don't want themfocused on how the jawmoves or whether the character'smaking

a pucker with their lips.

RJR: Can you comment on the amount of cleanup that mocap data requires these days?

What are the common types of errors and ways to fix them?

Apostoloff: In terms of body mocap, it's fairly straightforward. The capture system

often works at about 120Hz, so you have a lot of temporal information in there that

you can use to clean up missing markers.

We often use what we call a “rigid fill” to fill in gaps in marker trajectories. That is,

if amarker appears in one frame but not the next, we solve a least-squares problem to

align nearby knownmarkers in both frames with a rigid transformation. Then we can

fill in the missing marker in the current frame by applying the rigid transformation

to its position in the previous frame. People will also use splines to bridge gaps in

trajectories or to remove noise from individual trajectories.

For facial motion capture, we've developed several more advanced algorithms.

Prior to going into production, we put the actors into a separate capture volume and

have them go over a set of facial poses that try to cover the entire gamut of what they

can do in terms of facial deformation. These include a mix of visemes, facial action

coding units, and emotional ranges. These facial poses are recorded using the MOVA

Contour system. We remove the head motions and use those to build a an actor-

specific statistical model of what their face does when it deforms, which we put into a

Bayesian framework to estimate missing markers when they occur. The prior term is

built from the captured training data, and the data likelihood term involves what you

get back from the mocap. We use simple Gaussian models for the face and it works

well, but we'd need a different approach for the body since there's more articulated

motion.

Wedig: Another approach we use is to build a face model as a linear combination

of selected shapes from the training process. We then can project mocap data onto

this model, with a bias toward having large contributions from a small set of shapes.

The animator can then use the shapes “activated” in a given frame as a basis for

subsequent animation.

Search WWH ::

Custom Search

Home