Game Development Reference
In-Depth Information
Human Motion Recognition
Human motion recognition may also be achieved by analyzing the extracted 3D
pose parameters. However, because of the extra pre-processing required,
recognition of human motion patterns is usually achieved by exploiting low-level
features (e.g., silhouettes) obtained during tracking.
Continuous human activity (e.g., walking, sitting down, bending) is separated in
Ali & Aggarwal (2001) into individual actions using one camera. In order to
detect the commencement and termination of actions, the human skeleton is
extracted and the angles subtended by the torso, the upper leg and the lower leg,
are estimated. Each action is then recognized based on the characteristic path
that these angles traverse. This technique, though, relies on lateral views of the
human body.
Park & Aggarwal (2000) propose a method for separating and classifying not
one person's actions, but two humans' interactions (shaking hands, pointing at
the opposite person, standing hand-in-hand) in indoor monocular grayscale
images with limited occlusions. The aim is to interpret interactions by inferring
the intentions of the persons. Recognition is independently achieved in each
frame by applying the K-nearest-neighbor classifier to a feature vector, which
describes the interpersonal configuration. In Sato & Aggarwal (2001), human
interaction recognition is also addressed. This technique uses outdoor monocular
grayscale images that may cope with low-quality images, but is limited to
movements perpendicular to the camera. It can classify nine two-person
interactions (e.g., one person leaves another stationary person, two people meet
from different directions). Four features are extracted (the absolute velocity of
each person, their average size, the relative distance and its derivative) from the
trajectory of each person. Identification is based on the feature's similarity to an
interaction model using the nearest mean method.
Action and interaction recognition, such as standing, walking, meeting people and
carrying objects, is addressed by Haritaoglu, Harwood & Davis (1998, 2000). A
real-time tracking system, which is based on outdoor monocular grayscale
images taken from a stationary visible or infrared camera, is introduced.
Grayscale textural appearance and shape information of a person are combined
to a textural temporal template, which is an extension of the temporal templates
defined by Bobick & Davis (1996).
Bobick & Davis (1996) introduced a real-time human activity recognition
method, which is based on a two-component image representation of motion. The
first component (Motion Energy Image, MEI) is a binary image, which displays
where motion has occurred during the movement of the person. The second one
(Motion History Image, MHI) is a scalar image, which indicates the temporal
Search WWH ::




Custom Search