Information Technology Reference
In-Depth Information
understanding human actions by computers is to obtain robust action recog-
nition under variable illumination, background changes, camera motion and
zooming, viewpoint changes and partial occlusions. Moreover, the system has
to cope with a high intraclass variability: the actions can be performed by
people wearing different clothes, having different postures and size.
Different approaches in the field of human action recognition have been pro-
posed and developed in the literature: holistic and part-based representations.
Holistic representations focus on the whole human body trying to search
characteristics such as contours or pose. Usually holistic methods, which fo-
cus on the contours of a person, do not consider the human body as being
composed of body parts but consider the whole form of human body in the
analyzed frame. Efros et al. [7] use cross-correlation between optical flow
descriptors in low resolution videos. However, subjects must be tracked and
stabilized and if the background is non uniform, a figure-ground segmentation
is required. Bobick et al. [4] use motion history images that capture motion
and shape to represent actions. They introduced the global descriptors motion
energy image and motion history image. However, their method depends on
background subtraction. This method has been extended by Weinland et al.
[20]. Shechtman et al. [18] use similarity between space-time volumes which
allows finding similar dynamic behaviors and actions, but can not handle
large geometric variation between intra-class samples, moving cameras and
non-stationary backgrounds.
Motion and trajectories are also commonly used features for recognizing
human actions and this could be defined as pose estimation in holisic ap-
proaches. Ramanan and Forsyth [15] tracks body parts and then use the ob-
tained motion trajectories to perform action recognition. In particular, they
track the humans in the sequences using a structure procedure and then 3D
body configurations are estimated and compared to a highly annotated 3D
motion library. Multiple cameras and 4D trajectories are used by Yilmaz et
al. [21] to recognizing human actions in videos acquired by uncalibrated and
moving cameras. They proposed to extend the standard epipolar geometry
to the geometry of dynamic scenes and showed the versatility of such method
for recognizing of actions in challenging sequences. Ali et al. [3] use trajecto-
ries of hands, feet and body. The human body is modelled from experimental
data as a nonlinear and chaotic system.
Holistic methods may depend on the recording conditions such as position
of the pattern in the frame, spatial resolution, relative motion with respect
to the camera and can be influenced by variations in the background and
by occlusions. These problems can be solved in principle by external mech-
anisms (e.g. spatial segmentation, camera stabilization, tracking etc.), but
such mechanisms might be unstable in complex situations and require more
computational demand.
Part-based representations typically search for Space-Time Interest Points
(STIPs) in the video, apply a robust description of the area around them and
create a model based on independent features (Bag of Words) or a model
 
Search WWH ::




Custom Search