Game Development Reference
In-Depth Information
on the characteristic of the application running in the system. For this purpose,
we focus on a specific algorithm, our proposed 3D human detection/activity
recognition system, and evaluate some extended aspects that are presented in
this section.
Algorithmic issues
In the Section “3D Human Detection and Activity Recognition Techniques,” we
presented previous work on the basic steps of stereo vision algorithms and their
real-time applicability for different applications. In general, we can divide 3D
human detection and activity recognition methods into two categories (Cheung
et al., 2000): off-line methods, where the algorithms focus on detailed model
reconstruction (e.g., wire-frame generation), and real-time methods with global
3D human model reconstruction (Bregler & Malik, 1998; Delamarre & Faugeras,
2001).
The major challenge in many 3D applications is to compute dense range data at
high frame rates, since participants cannot easily communicate if the processing
cycle or network latencies are long. As an example of non-real-time methods,
we can give Mulligan et al.'s (2001) work. In their work, to achieve the required
speed and accuracy, Mulligan et al. use a matching algorithm based on the sum
of modified, normalized cross-correlations, and sub-pixel disparity interpolation.
To increase speed, they use Intel IPL functions in the pre-processing steps of
background subtraction and image rectification, as well as a four-processor
parallelization. The authors can only achieve a speed of 2-3 frames per second.
Another non-real-time method (Kakadiaris & Metaxas, 1995) has been pre-
sented in the previous section.
Most of the real-time methods use a generic 3D human model and fit the
projected model to the projected silhouette features. Another silhouette-based
method is proposed by Cheung et al. (2000) and, recently, by Luck et al. (2002),
where the human model is fit in real-time and in the 3D domain. The first method
can reach a speed of 15 frames per second, whereas the second one runs at 20
frames per second. The speed of the systems highly depend on the voxel
resolution. None of these methods tried to use 2D information obtained from
each camera and combine the high-level information, e.g., head, torso, hand
locations and activities, as well as the low-level information, e.g., ellipse
parameters, to generate a global 3D model of the human body parts and
recognize their activities in 3D. 2D information in terms of human image position
and body labeling information is a very valuable component for higher level
modules. In our system, it forms the basis for constructing the 3D body and
activity model.
Search WWH ::




Custom Search