Advances in Vision-Based Human Body Modeling - 3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body

Game Development Reference

In-Depth Information

CONDENSATION algorithms used in tracking and a comparison with the

Kalman filters can be found in Isard & Blake (1998).

In Wachter & Nagel (1999), a 3D model composed of right-elliptical cones is

fitted to consecutive frames by means of an iterated extended Kalman filter. A

motion model of constant velocity for all DOFs is used for prediction, while the

update of the parameters is based on a maximum a-posteriori estimation

incorporating edge and region information. This approach is able to cope with

self-occlusions occurring between the legs of a walking person. Self-occlusions

are also tackled in a Bayesian tracking system presented in Howe, Leventon &

Freeman (1999). This system tracks human figures in short monocular se-

quences and reconstructs their motion in 3D. It uses prior information learned

from training data. Training data consists of a vector gathered over 11 succes-

sive frames representing the 3D coordinates of 20 tracked body points and is

used to build a mixture-of-Gaussians probability density model. 3D reconstruc-

tion is achieved by establishing correspondence between the training data and

the features extracted. Sidenbladh, Black & Sigal (2002) also use a probabilistic

approach to address the problem of modeling 3D human motion for synthesis and

tracking. They avoid the high dimensionality and non-linearity of body movement

modeling by representing the posterior distribution non-parametrically. Learning

state transition probabilities is replaced with an efficient probabilistic search in

a large training set. An approximate probabilistic tree-search method takes

advantage of the coefficients of a low-dimensional model and returns a particular

sample human motion.

In contrast to single-view approaches, multiple camera techniques are able to

overcome occlusions and depth ambiguities of the body parts, since useful motion

information missing from one view may be recovered from another view.

A rich set of features is used in Okada, Shirai & Miura (2000) for the estimation

of the 3D translation and rotation of the human body. Foreground regions are

extracted by combining optical flow, depth (which is calculated from a pair of

stereo images) and prediction information. 3D pose estimation is then based on

the position and shape of the extracted region and on past states using Kalman

filtering. The evident problem of pose singularities is tackled probabilistically.

A framework for person tracking in various indoor scenes is presented in Cai &

Aggarwal (1999), using three synchronized cameras. Though there are three

cameras, tracking is actually based on one camera view at a time. When the

system predicts that the active camera no longer provides a sufficient view of

the person, it is deactivated and the camera providing the best view is selected.

Feature correspondence between consecutive frames is achieved using Baye-

sian classification schemes associated with motion analysis in a spatial-temporal

domain. However, this method cannot deal with occlusions above a certain level.

Search WWH ::

Custom Search

Home