Graphics Reference
In-Depth Information
In the mobile robotic system of Groß et al. ( 2006 ), a particle filter for tracking
several objects simultaneously is used in combination with the Adaboost face de-
tector (Viola and Jones, 2001 ). Cylinder models are adapted to depth data acquired
with a laser scanner, while images are acquired with a fisheye camera and a stan-
dard camera. The distribution of positions of persons in the vicinity of the robot is
modelled by a mixture of Gaussians, leading to a Bayesian framework which allows
one to integrate all available sensor information in order to accept or reject object
hypotheses. When a person is detected, the system attempts to recognise a deictic
gesture and to infer the pointing direction from the available monocular image data.
Using the position of the head as a reference, an image region containing the head
and another region containing the arm are determined. The distance and the direc-
tion of the position to which the person is pointing are determined subsequently
using multilayer perceptron classification and regression. The achieved accuracy of
the distance typically corresponds to about 50-200 mm and the angular accuracy to
approximately 10 .
Fritsch et al. ( 2004 ) rely on the extraction of skin-coloured regions to determine
spatial trajectories and recognise activities based on a particle filter approach. They
include the integration of knowledge about the object with which an interaction is
performed, which they term 'context information'. A gesture is assumed to occur in
a 'situational context', given by the conditions to be fulfilled for it to be performed
and by its impact on the environment, as well as in a 'spatial context', i.e. the spatial
relations between the hand trajectories and the objects in the scene, e.g. denoting
which object is being gripped. A 'context area' in the image is given by the part
of the scene in which objects are expected to be manipulated. Fritsch et al. ( 2004 )
integrate the situational context by restricting the particles to those consistent with
the requirements of the action, while the spatial context is considered by suppress-
ing particles that disagree with the observed object information. Bauckhage et al.
( 2005 ) describe an integration of these concepts into a cognitive vision system for
the recognition of actions, also addressing the issue of evaluation.
7.1.2.4 Discussion in the Context of Industrial Safety Systems
Based on the previously described methods, it is possible to recognise the mean-
ing of observed gestures, and it has been shown that it is possible to incorporate
contextual information such as relations between the hand and objects in the scene.
Many of the systems described above rely on a detection of the hand based on
skin colour. An important drawback of these approaches is the fact that colour-
based segmentation tends to be unstable in the presence of variable illumination,
especially changing spectral characteristics of the incident light. Although adap-
tively modelling the colour distribution in the image may partially solve this prob-
lem, skin colour cues are not suitable for detecting humans in the industrial produc-
tion environment, since in many domains of industrial production dedicated work
clothes such as gloves are obligatory. An industrial safety system thus must be able
to recognise the human body or parts of it based on shape cues alone. Furthermore,
Search WWH ::




Custom Search