Game Development Reference
In-Depth Information
Dockstader & Tekalp (2001) introduce a distributed real-time platform for
tracking multiple interacting people using multiple cameras. The features
extracted from each camera view are independently processed. The resulting
state vectors comprise the input to a Bayesian belief network. The observations
of each camera are then fused and the most likely 3D position estimates are
computed. A Kalman filter performs state propagation in time. Multi-viewpoints
and a viewpoint selection strategy are also employed in Utsumi et al. (1998) to
cope with self-occlusions and human-human occlusions. In this approach,
tracking is based on Kalman filtering estimation as well, but it is decomposed into
three sub-tasks (position detection, rotation angle estimation and body-side
detection). Each sub-task has its own criterion for selecting viewpoints, while the
result of one sub-task can help estimation in another sub-task.
Delamarre & Faugeras (2001) proposed a technique which is able to cope not
only with self-occlusions, but also with fast movements and poor quality images,
using two or more fixed cameras. This approach incorporates physical forces to
each rigid part of a kinematic 3D human body model consisting of truncated
cones. These forces guide the 3D model towards a convergence with the body
posture in the image. The model's projections are compared with the silhouettes
extracted from the image by means of a novel approach, which combines the
Maxwell's demons algorithm with the classical ICP algorithm.
Some recently published papers specifically tackle the pose recovery problem
using multiple sensors. A real-time method for 3D posture estimation using
trinocular images is introduced in Iwasawa et al. (2000). In each image the
human silhouette is extracted and the upper-body orientation is detected. With
a heuristic contour analysis of the silhouette, some representative points, such as
the top of the head are located. Two of the three views are finally selected in
order to estimate the 3D coordinates of the representative points and joints. It is
experimentally shown that the view-selection strategy results in more accurate
estimates than the use of all views.
Multiple views in Rosales et al. (2001) are obtained by introducing the concept
of “virtual cameras”, which is based on the transformation invariance of the Hu
moments. One advantage of this approach is that no camera calibration is
required. A Specialized Mappings Architecture is proposed, which allows direct
mapping of the image features to 2D image locations of body points. Given
correspondences of the most likely 2D joint locations in virtual camera views, 3D
body pose can be recovered using a generalized probabilistic structure from
motion technique.
Search WWH ::




Custom Search