Graphics Reference
In-Depth Information
When the probability densities in Equation ( 7.35 ) are modeled using Gaussian
distributions, the computation of the posterior reduces to the Kalman filter , a well-
known signal processing algorithm [ 165 ]. However, in the motion capture problem,
both densities are poorly modeled by Gaussians (in particular, they are multimodal)
and a more appropriate approach is particle filtering [ 212 ]. In particle filtering, the
posterior density is represented as a set of samples
{
s k }
of the distribution, each with
{ π k }
a probability
. This allows us to easily extract a single estimate of the current
state (either by selecting the sample with the highest probability or by computing a
weighted average of the samples based on their probabilities) or to retain multiple
hypotheses about the current state (given by the top modes of the sample set).
However, since the state space for human pose is very large (that is, the vector
θ (
is usually at least thirty-dimensional), a standard particle filter would require
an intractable number of samples to accurately represent the posterior density.
Deutscher and Reid [ 119 ] proposed a modified particle filter for pose estimation
that borrows ideas from simulated annealing and genetic algorithms to successively
refine the estimate of the posterior with a viable number of samples. An alternate
approach proposed by Sminchisescu and Triggs [ 457 ] focuses the samples in regions
with high uncertainty.
Another way to deal with the large state space is to reduce its dimensionality.
For example, a specific action such as walking has fewer degrees of freedom than
a generic pose, which can be revealed by analyzing a training dataset using prin-
cipal component analysis [ 447 ] or a more sophisticated latent variable model (see
Section 7.4.3 ).
Modeling the state transition likelihood p
t
)
in Equation ( 7.36 )is
similar to the methods discussed in Section 7.4.3 . For example, we can use single-
frame and dynamical constraints based on biomechanical training data, in addition
to incorporating character- or activity-specific learned models. In the rest of this
section, we briefly overview typical features for markerless motion capture, which
are used to form the observation likelihood p
( θ (
t
) | θ (
t
1
))
(
r
(
t
) | θ (
t
))
in Equation ( 7.36 ).
7.7.2
Silhouettes and Edges
Conventional motion capture systems use a large number of cameras to triangulate
the observed images of the markers. In contrast, markerless systems often use a
smaller number of cameras (as few as one), using features extracted in the set of
images at each instant as the basis for pose estimation. When multiple cameras are
involved, they must be calibrated using methods similar to those in Section 7.1 .
First, we illustrate why the markerless problem is hard. Figure 7.20 illustrates sev-
eral inherent difficulties with estimating pose from a single image. First, we face the
challenging problem of isolating the human figure from a non-ideal background, a
natural image matting problem of the type discussed in Chapter 2 . This can be miti-
gated by using as simple of a background as possible, but markerless methods rarely
go so far as to use a green screen.
Assuming we've accurately segmented the human from the background, several
challenges remain. Foremost, in the absence of carefully placed markers and tight-
fitting clothing, the positions of joints aremuchmore difficult to infer froman image.
For example, in Figure 7.20 the torso and arms all have the same texture and are
Search WWH ::




Custom Search