Motion Capture - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

et al. [ 570 ] and Ma et al. [ 309 ]. We'll discuss structured light approaches in detail in

Section 8.2 .

If the goal is simply to record the general shape and pose of the performer's face at

each instant, then a lower-resolution approach such as fitting an active appearance

model to single-camera video [ 316 ] is more appropriate than full motion capture.

7.7

MARKERLESS MOTION CAPTURE

Finally, we discuss markerless motion capture , the problem of estimating human

pose from images alone, without identifiable markers and preferably without con-

straints on the performer's clothing or environment. Determining a human's pose

in an image and tracking him/her through a video sequence are two of the most

studied problems in computer vision, so we can only give a brief overview of this

research area here. We'll focus on approaches that have the same goals as markered

motion capture — that is, algorithms that estimate an articulated skeleton from a set

of images.

To form relationships between the images and the kinematic model, markerless

methods generally assume that a solid 3D human model can be created for each

pose. Asmentioned in Section 7.4.4 , this solidmodel can be composed of ellipsoids or

tapered cylinders, or it can be amore detailedmodel of the humanmusculature [ 365 ].

With the increased availability of full-body 3D scanners (see Section 8.2 ), it is growing

more common to use a detailed triangulated mesh captured from the performer

him/herself for thebodymodel. Sucha triangulatedmeshcanbe skinnedwith respect

to the underlying kinematic model, or parameterized in a lower-dimensional space

based on analyzing training data [ 12 , 15 ].

First, we describe the general approach common to most markerless motion cap-

ture algorithms of formulating pose estimation using a dynamical system. We then

reviewhow silhouettes and edges of the performer extracted frommulticamera video

can be used as the basis for estimating pose. Finally, we discuss how silhouettes

can be backprojected into world coordinates to create visual hulls , constraining the

estimation problem in 3D rather than 2D.

Markerless motion capture algorithms aren't generally used for production-

quality visual effects. The estimated 3D trajectories of points are less accurate, since

the underlying 2Dcorrespondences of features inunconstrained video can't be found

as accurately and robustly as the highly engineered retro-reflective markers in a

conventional motion capture system. 14 Furthermore, the connection between 2D

tracked features and the underlying kinematicmodel is less strict, since street clothes

are looser and move more freely than a body suit. Also, the image features are auto-

matically chosen by the algorithm instead of carefully engineered to give maximal

information about the skeleton.

In general, markerless systems can produce good estimates for the general pose

of a human's limbs in a video sequence, but are unlikely to yield the fine-detail,

14 Markered motion capture systems can triangulate 3D markers to sub-millimeter accuracy. In

contrast, markerless motion capture systems often use markered motion capture as a ground-

truth reference and the best algorithms usually report 3D errors from these measurements of

around three centimeters.

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home