Motion Capture - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

When the probability densities in Equation ( 7.35 ) are modeled using Gaussian

distributions, the computation of the posterior reduces to the Kalman filter , a well-

known signal processing algorithm [ 165 ]. However, in the motion capture problem,

both densities are poorly modeled by Gaussians (in particular, they are multimodal)

and a more appropriate approach is particle filtering [ 212 ]. In particle filtering, the

posterior density is represented as a set of samples

{

s k }

of the distribution, each with

{ π k }

a probability

. This allows us to easily extract a single estimate of the current

state (either by selecting the sample with the highest probability or by computing a

weighted average of the samples based on their probabilities) or to retain multiple

hypotheses about the current state (given by the top modes of the sample set).

However, since the state space for human pose is very large (that is, the vector

θ (

is usually at least thirty-dimensional), a standard particle filter would require

an intractable number of samples to accurately represent the posterior density.

Deutscher and Reid [ 119 ] proposed a modified particle filter for pose estimation

that borrows ideas from simulated annealing and genetic algorithms to successively

refine the estimate of the posterior with a viable number of samples. An alternate

approach proposed by Sminchisescu and Triggs [ 457 ] focuses the samples in regions

with high uncertainty.

Another way to deal with the large state space is to reduce its dimensionality.

For example, a specific action such as walking has fewer degrees of freedom than

a generic pose, which can be revealed by analyzing a training dataset using prin-

cipal component analysis [ 447 ] or a more sophisticated latent variable model (see

Section 7.4.3 ).

Modeling the state transition likelihood p

t

)

in Equation ( 7.36 )is

similar to the methods discussed in Section 7.4.3 . For example, we can use single-

frame and dynamical constraints based on biomechanical training data, in addition

to incorporating character- or activity-specific learned models. In the rest of this

section, we briefly overview typical features for markerless motion capture, which

are used to form the observation likelihood p

( θ (

t

) | θ (

t

−

1

))

(

r

(

t

) | θ (

t

))

in Equation ( 7.36 ).

7.7.2

Silhouettes and Edges

Conventional motion capture systems use a large number of cameras to triangulate

the observed images of the markers. In contrast, markerless systems often use a

smaller number of cameras (as few as one), using features extracted in the set of

images at each instant as the basis for pose estimation. When multiple cameras are

involved, they must be calibrated using methods similar to those in Section 7.1 .

First, we illustrate why the markerless problem is hard. Figure 7.20 illustrates sev-

eral inherent difficulties with estimating pose from a single image. First, we face the

challenging problem of isolating the human figure from a non-ideal background, a

natural image matting problem of the type discussed in Chapter 2 . This can be miti-

gated by using as simple of a background as possible, but markerless methods rarely

go so far as to use a green screen.

Assuming we've accurately segmented the human from the background, several

challenges remain. Foremost, in the absence of carefully placed markers and tight-

fitting clothing, the positions of joints aremuchmore difficult to infer froman image.

For example, in Figure 7.20 the torso and arms all have the same texture and are

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home