Information Technology Reference
In-Depth Information
1
Introduction
Gestures in interfaces are becoming more and more popular as the advances in tech-
nology have been allowing proliferation of software and devices with such inter-
action capabilities [40]. An increasing interest may be observed in adopting such
interfaces with a great mass of consumers owning personal mobile devices that al-
low entering strokes via a stylus or simply using touch-based commands. The same
interest is equally observed in various research communities concerned with ges-
ture acquisition and analysis such as human-computer interaction, computer vision
or pattern recognition [12, 33, 34, 38, 45]. Although the technology is clearly at
the beginning (as there is a huge difference between entering strokes on a mobile
device and interacting freely with computer systems or ambient media via natural
gesturing), the motivation that drives all these developments is a strong one: gestures
are familiar, easy to perform and natural to understand. The common perception is
that an interface capable of correctly understanding and interpreting human gestures
would be ideal by allowing similar interactions in the human-computer dialogue as
they appear in our every-day life.
A great number of technologies are available today for capturing human input
and video cameras represent one of them. The advantage that computer vision brings
with respect to the other technologies is that users are not bound to any equipment:
there is nothing to be worn or held that could importunate or burden the interaction.
This encourages free hand gestures, natural head movements or even whole body
input in accordance with the application or the task needs. The downside of this
great flexibility in interaction is represented by the amount of computations required
for processing video as well as by the current state of computer vision technology,
a field of research that is still in its dawn with far-than-perfect recognition rates.
One way to get around these inconveniences is to control the working scenario but
sometimes this proves challenging if not impossible (as it would be the case of a
mobile robot capable of interpreting gestures but which travels freely outside the
laboratory). The other idea is to use specific video events that can help determining
when a gesture started and when it ended by event-based detection, reasoning and
inferring with simple yet robust assumptions.
This chapter proposes a discussion on events that help the process of detecting
gestures in video sequences by identifying and describing the most frequent ap-
proaches known to work for such a task. Criteria such as location, time, posture
or various combinations have been used in order to infer both on the users as well
as on the gestures they perform. A gesture is thus specified by various predefined
events that are being detected or monitored in the video sequence. Equally impor-
tant, the chapter takes into consideration what happens from the user's point of view:
once a gesture has been successfully detected, recognized, and its associated action
executed accordingly, the feedback the users are receiving proves extremely impor-
tant for the fluency of the interaction. From the users' perspective, a recognized
gesture that has been acknowledged appropriately represents an important event in
the new established communication as it would be in the general human-human
dialogue. This can be explained by humans' predisposition and willingness to
Search WWH ::




Custom Search