The Understanding of Meaningful Events in Gesture-Based Interaction - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

with regards to the feedback mechanisms that transform into events in the human-

computer dialogue. The main purpose of the chapter is thus to bring further knowl-

edge to the current understanding of gesture-based interfaces by the means of

meaningful events and by looking at the topic from two different but equally im-

portant perspectives.

2

Events for Spotting and Detecting Gestures

Techniques for spotting gestures in video sequences are being implemented by de-

tecting and monitoring (or tracking) custom events defined using location, time and

posture criteria. The successful detection of one event triggers gesture recording

while detection of another starts the gesture recognizer that classifies the recorded

motion. This section analyses various event types and their associated criteria as

well as the algorithms and techniques employed for segmenting gestures from

sequences of continuous motion. The focus will be primarily oriented towards

vision-based computing but other capture technologies will be mentioned when the

techniques and algorithms used for segmenting human motions are relevant to the

discussion. Jaimes and Sebe [45], Poppe [38], Moeslund et al. [33, 34] and Erol et

al. [12] provide extensive surveys on vision-based motion capture and analysis as

well as on multimodal human-computer interaction and they represent good starting

points for a general overview on the advances in these fields.

We identify four different event types that have been used extensively either

singly or in various combinations for segmenting gestures in video sequences:

Location represents a powerful cue for detecting postures as well as for segment-

ing interesting gestures from continuous movements. Requiring that a gesture

starts or ends in a predefined region or knowing/learning that some locations in

the scene are more likely to contain valid gestures leads to great reduction in

algorithmic complexity;

Po stu re information allows marking gesture commands in a way that feels natural

and accessible for the users to perform and model cognitively: for example, a

given posture could mark the beginning of a gesture while another signals its

ending. Posture is a robust cue that gives both the user as well as the system the

certainty that a gesture command is being entered: the system is able to filter

out the majority of movements while being interested only in the actual gesture

commands. Also, users are creating themselves a mental model for the interaction

process: commands are issued only if specified postures are being executed in a

similar manner to how click-like events work in standard WIMP interfaces;

Tap and touch events can be detected by touch-sensitive materials as well as by

video cameras (most horizontal interactive surfaces use IR video cameras in order

to detect touch events on the tabletop). A tap or a touch is clearly perceived as a

marking event from both the system as well as the user's perspective. Touching

clearly signifies both intent as well as command during the interaction process;

Custom -based events other than the above may be additionally used in order to

ease even further the gesture detection process. They usually relate to various

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home