Graphics Reference
In-Depth Information
appearance of the object in the image as a result of the projection from the three-
dimensional scene into the two-dimensional image plane is addressed, which may
lead to reconstruction errors, especially when partial self-occlusions of the body
occur. Body pose estimation methods are divided by Sminchisescu ( 2008 ) into gen-
erative algorithms, relying on a model of the observation likelihood which is sup-
posed to obtain its maximum value once the pose parameters have been estimated
correctly, and discriminative algorithms, which learn the probability distribution of
the pose parameters from examples and predict them using Bayesian inference.
An early approach by Gavrila and Davis ( 1996 ) to full body pose estimation
involves template matching in several distance-transformed images acquired from
different viewpoints distributed around the person. Plänkers and Fua ( 2003 ) and
Rosenhahn et al. ( 2005 ) apply multiple-view three-dimensional pose estimation al-
gorithms which are based on silhouette information.
Plänkers and Fua ( 2003 ) make use of three-dimensional data generated by a
stereo camera to obtain a pose estimation and tracking of the human upper body.
The upper body is modelled with implicit surfaces, and silhouettes are used in ad-
dition to the depth data to fit the surfaces. Lange et al. ( 2004 ) propose a method for
tracking the movements of a human body in a sequence of images acquired by at
least two cameras based on the adaptation of a three-dimensional stick model with
a stochastic optimisation algorithm. A comparison between the appearance of the
stick model after projection into the image plane with the acquired images yields
an appropriate error function. A refinement of the correspondingly estimated joint
angles of the stick model is obtained based on several pairs of stereo images.
Rosenhahn et al. ( 2005 ) track the upper body of a person, which is represented by
a three-dimensional model with 21 body pose parameters consisting of connected
free-form surfaces. The pose estimation is based on silhouettes which are extracted
using level set functions. Tracking is performed by using the pose in the last frame
as the initial pose in the current frame. Using images acquired synchronously by
four cameras distributed around the person, they achieve a high reconstruction ac-
curacy of about 2 for the joint angles under laboratory conditions without a clut-
tered background. As a ground truth, the joint angles determined by a commercial
marker-based tracking system with eight cameras are used. In an extension of this
method by Brox et al. ( 2008 ), the silhouette of the person is inferred from a single
image or several images acquired by cameras distributed around the person, and a
three-dimensional model representing the body surface is adapted to the silhouettes.
The procedures of pose estimation and silhouette extraction based on level sets are
alternated in order to allow tracking in scenes with a non-uniform and non-static
background. In this context, Bayesian inference involving the local probability den-
sity models of image regions with respect to different features such as grey value,
RGB colour values, or texture is used for simultaneously extracting a contour and
a set of pose parameters. For large pose differences between successive images,
prediction of the pose is achieved based on the optical flow. Since the pose esti-
mation yields correspondences between the two-dimensional silhouettes in the im-
ages and the three-dimensional body model, while the optical flow yields correspon-
dences between two-dimensional image positions in the current and the subsequent
Search WWH ::




Custom Search