Advances in Vision-Based Human Body Modeling - 3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body

Game Development Reference

In-Depth Information

Dockstader & Tekalp (2001) introduce a distributed real-time platform for

tracking multiple interacting people using multiple cameras. The features

extracted from each camera view are independently processed. The resulting

state vectors comprise the input to a Bayesian belief network. The observations

of each camera are then fused and the most likely 3D position estimates are

computed. A Kalman filter performs state propagation in time. Multi-viewpoints

and a viewpoint selection strategy are also employed in Utsumi et al. (1998) to

cope with self-occlusions and human-human occlusions. In this approach,

tracking is based on Kalman filtering estimation as well, but it is decomposed into

three sub-tasks (position detection, rotation angle estimation and body-side

detection). Each sub-task has its own criterion for selecting viewpoints, while the

result of one sub-task can help estimation in another sub-task.

Delamarre & Faugeras (2001) proposed a technique which is able to cope not

only with self-occlusions, but also with fast movements and poor quality images,

using two or more fixed cameras. This approach incorporates physical forces to

each rigid part of a kinematic 3D human body model consisting of truncated

cones. These forces guide the 3D model towards a convergence with the body

posture in the image. The model's projections are compared with the silhouettes

extracted from the image by means of a novel approach, which combines the

Maxwell's demons algorithm with the classical ICP algorithm.

Some recently published papers specifically tackle the pose recovery problem

using multiple sensors. A real-time method for 3D posture estimation using

trinocular images is introduced in Iwasawa et al. (2000). In each image the

human silhouette is extracted and the upper-body orientation is detected. With

a heuristic contour analysis of the silhouette, some representative points, such as

the top of the head are located. Two of the three views are finally selected in

order to estimate the 3D coordinates of the representative points and joints. It is

experimentally shown that the view-selection strategy results in more accurate

estimates than the use of all views.

Multiple views in Rosales et al. (2001) are obtained by introducing the concept

of “virtual cameras”, which is based on the transformation invariance of the Hu

moments. One advantage of this approach is that no camera calibration is

required. A Specialized Mappings Architecture is proposed, which allows direct

mapping of the image features to 2D image locations of body points. Given

correspondences of the most likely 2D joint locations in virtual camera views, 3D

body pose can be recovered using a generalized probabilistic structure from

motion technique.

Search WWH ::

Custom Search

Home