Information Technology Reference
In-Depth Information
An Appearance-Based Prior for Hand Tracking
Mathias K olsch
The MOVES Institute, Naval Postgraduate School, Monterey, CA
Abstract. Reliable hand detection and tracking in passive 2D video still remains
a challenge. Yet the consumer market for gesture-based interaction is expanding
rapidly and surveillance systems that can deduce fine-grained human activities
involving hand and arm postures are in high demand. In this paper, we present
a hand tracking method that does not require reliable detection. We built it on
top of “Flocks of Features” which combines grey-level optical flow, a “flock-
ing” constraint, and a learned foreground color distribution. By adding proba-
bilistic (instead of binary classified) detections based on grey-level appearance as
an additional image cue, we show improved tracking performance despite rapid
hand movements and posture changes. This helps overcome tracking difficul-
ties in texture-rich and skin-colored environments, improving performance on a
10-minute collection of video clips from 75% to 86% (see examples on our web-
site). 1
1
Introduction
While reliable and fast methods to detect and track rigid objects such as faces and cars
have matured in the last decade, articulated objects-such as the human body and hand-
continue to pose difficult problems to recognition algorithms. The consumer demand
for gesture-based interaction, exemplified by the success of and anticipation for the
game platforms Wii and Project Natal, has brought about sensing modalities other than
color video, including acquisition of depth through active sensors. These are more ex-
pensive and less prevalent than video cameras. Particularly, human activity monitoring
for elderly care and surveillance applications has to rely on legacy sensors.
Articulated objects present such a difficult challenge because almost every aspect
of their characteristics can change: their orientation, size, and shape (silhouette), their
apparent color, and their appearance especially due to self-occlusion. No one image
cue can be expected to contain sufficient information for detection or tracking. Hence,
our approach to overcome these difficulties is to combine many image cues into a rich
feature vector that permits more reliable, multimodal hand tracking. We started with a
multi-cue method called “Flocks of Features” [8] (FoF) that combines grey-level LK-
feature tracking, a proximity constraint on the tracked features, and a learned fore-
ground color distribution. It can track fast movements and posture changes despite a
dynamic background. Still, tracking is difficult if the hand undergoes posture changes
and the background color is similar to the tracked object's color and the background
contains high gradients to which the LK-features might attach. The hand's appearance-
that is, all or part of its grey-level texture-should be taken into consideration for track-
ing as well. In this paper, we present a method that allows for fast calculation of a
1 http://www.movesinstitute.org/%7Ekolsch/paper241Video.wmv
 
Search WWH ::




Custom Search