The Understanding of Meaningful Events in Gesture-Based Interaction - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

regions of interest around it. The preferred choice in this case would be to detect

the operator's face as robust solutions such as the Viola and Jones face detector do

exist for accomplishing this task [52]. Once the location of the operator face and,

implicitly, the location of his head are known, the positions of other body parts

are inferred using simple geometry constraints. Figure 4 illustrates this idea with

several regions dynamically computed around the user's head at one arm-length

around his body. Even more, detecting the face brings in another advantage: as the

face region will expose skin-colored pixels, on-the-fly calibration of color detectors

can be performed [7] which increases system robustness and adaptability to the

environment.

Fig. 4 The user's face can be robustly detected and used subsequently for inferring the loca-

tions of other body parts. Following [7] and [32], special regions of interest may be defined

around the bounding rectangle of the user's face. When the hand is detected in one particular

region an event will be triggered signaling the start, continuation or ending of a gesture. A

wave gesture can be described for example as a series of 1-2-1-2-1 location events.

Cerlinca et al. [7] define and use such regions of interest around the human body in

order to facilitate segmentation and recognition of free hand gestures. Their purpose

is to transmit commands to a mobile robot that may circulate in any environment.

Ten such regions are defined around the operator's face which is reliably detected

using the Viola and Jones classifier. Skin color-based thresholding is further applied

to these regions only in order to detect the operator's hands. Combinations of several

active regions (for which hands were detected inside) correspond to various gesture

commands such as move forward, backward, turn left, turn right, etc.

Marcel [32] performs a similar partitioning of the space around the user's face

combined with anthropometry constraints in order to detect the location of the hand.

The scenario is very similar to the one illustrated in Figure 4. The user's intention for

producing a gesture is detected by monitoring the active windows that are formed in

the body-face space. When skin-colored objects are detected within these predefined

windows, assumption of the user hand is taken and a posture recognizer further

employed.

Iannizzotto et al. [19] use the same principle of location-based events for de-

tecting when a gesture command starts. The authors' Graylevel VisualGlove system

does not track the whole hand but rather the thumb and index fingers only in order to

simulate mouse operations. When both fingers are detected inside a predefined area

Search WWH ::

Custom Search

Home