Three-Dimensional Pose Estimation and Segmentation Methods - 3D Computer Vision: Efficient Methods and Applications

Graphics Reference

In-Depth Information

appearance of the object in the image as a result of the projection from the three-

dimensional scene into the two-dimensional image plane is addressed, which may

lead to reconstruction errors, especially when partial self-occlusions of the body

occur. Body pose estimation methods are divided by Sminchisescu ( 2008 ) into gen-

erative algorithms, relying on a model of the observation likelihood which is sup-

posed to obtain its maximum value once the pose parameters have been estimated

correctly, and discriminative algorithms, which learn the probability distribution of

the pose parameters from examples and predict them using Bayesian inference.

An early approach by Gavrila and Davis ( 1996 ) to full body pose estimation

involves template matching in several distance-transformed images acquired from

different viewpoints distributed around the person. Plänkers and Fua ( 2003 ) and

Rosenhahn et al. ( 2005 ) apply multiple-view three-dimensional pose estimation al-

gorithms which are based on silhouette information.

Plänkers and Fua ( 2003 ) make use of three-dimensional data generated by a

stereo camera to obtain a pose estimation and tracking of the human upper body.

The upper body is modelled with implicit surfaces, and silhouettes are used in ad-

dition to the depth data to fit the surfaces. Lange et al. ( 2004 ) propose a method for

tracking the movements of a human body in a sequence of images acquired by at

least two cameras based on the adaptation of a three-dimensional stick model with

a stochastic optimisation algorithm. A comparison between the appearance of the

stick model after projection into the image plane with the acquired images yields

an appropriate error function. A refinement of the correspondingly estimated joint

angles of the stick model is obtained based on several pairs of stereo images.

Rosenhahn et al. ( 2005 ) track the upper body of a person, which is represented by

a three-dimensional model with 21 body pose parameters consisting of connected

free-form surfaces. The pose estimation is based on silhouettes which are extracted

using level set functions. Tracking is performed by using the pose in the last frame

as the initial pose in the current frame. Using images acquired synchronously by

four cameras distributed around the person, they achieve a high reconstruction ac-

curacy of about 2 ◦ for the joint angles under laboratory conditions without a clut-

tered background. As a ground truth, the joint angles determined by a commercial

marker-based tracking system with eight cameras are used. In an extension of this

method by Brox et al. ( 2008 ), the silhouette of the person is inferred from a single

image or several images acquired by cameras distributed around the person, and a

three-dimensional model representing the body surface is adapted to the silhouettes.

The procedures of pose estimation and silhouette extraction based on level sets are

alternated in order to allow tracking in scenes with a non-uniform and non-static

background. In this context, Bayesian inference involving the local probability den-

sity models of image regions with respect to different features such as grey value,

RGB colour values, or texture is used for simultaneously extracting a contour and

a set of pose parameters. For large pose differences between successive images,

prediction of the pose is achieved based on the optical flow. Since the pose esti-

mation yields correspondences between the two-dimensional silhouettes in the im-

ages and the three-dimensional body model, while the optical flow yields correspon-

dences between two-dimensional image positions in the current and the subsequent

3D Computer Vision: Efficient Methods and Applications

Search WWH ::

Custom Search

Home