Graphics Reference
In-Depth Information
et al. [ 570 ] and Ma et al. [ 309 ]. We'll discuss structured light approaches in detail in
Section 8.2 .
If the goal is simply to record the general shape and pose of the performer's face at
each instant, then a lower-resolution approach such as fitting an active appearance
model to single-camera video [ 316 ] is more appropriate than full motion capture.
7.7
MARKERLESS MOTION CAPTURE
Finally, we discuss markerless motion capture , the problem of estimating human
pose from images alone, without identifiable markers and preferably without con-
straints on the performer's clothing or environment. Determining a human's pose
in an image and tracking him/her through a video sequence are two of the most
studied problems in computer vision, so we can only give a brief overview of this
research area here. We'll focus on approaches that have the same goals as markered
motion capture — that is, algorithms that estimate an articulated skeleton from a set
of images.
To form relationships between the images and the kinematic model, markerless
methods generally assume that a solid 3D human model can be created for each
pose. Asmentioned in Section 7.4.4 , this solidmodel can be composed of ellipsoids or
tapered cylinders, or it can be amore detailedmodel of the humanmusculature [ 365 ].
With the increased availability of full-body 3D scanners (see Section 8.2 ), it is growing
more common to use a detailed triangulated mesh captured from the performer
him/herself for thebodymodel. Sucha triangulatedmeshcanbe skinnedwith respect
to the underlying kinematic model, or parameterized in a lower-dimensional space
based on analyzing training data [ 12 , 15 ].
First, we describe the general approach common to most markerless motion cap-
ture algorithms of formulating pose estimation using a dynamical system. We then
reviewhow silhouettes and edges of the performer extracted frommulticamera video
can be used as the basis for estimating pose. Finally, we discuss how silhouettes
can be backprojected into world coordinates to create visual hulls , constraining the
estimation problem in 3D rather than 2D.
Markerless motion capture algorithms aren't generally used for production-
quality visual effects. The estimated 3D trajectories of points are less accurate, since
the underlying 2Dcorrespondences of features inunconstrained video can't be found
as accurately and robustly as the highly engineered retro-reflective markers in a
conventional motion capture system. 14 Furthermore, the connection between 2D
tracked features and the underlying kinematicmodel is less strict, since street clothes
are looser and move more freely than a body suit. Also, the image features are auto-
matically chosen by the algorithm instead of carefully engineered to give maximal
information about the skeleton.
In general, markerless systems can produce good estimates for the general pose
of a human's limbs in a video sequence, but are unlikely to yield the fine-detail,
14 Markered motion capture systems can triangulate 3D markers to sub-millimeter accuracy. In
contrast, markerless motion capture systems often use markered motion capture as a ground-
truth reference and the best algorithms usually report 3D errors from these measurements of
around three centimeters.
 
Search WWH ::




Custom Search