Information Technology Reference
In-Depth Information
Fig. 1. The appearance-based prior for select hand images
probability that an area's appearance could be attributed to a hand. Features that have
strayed from the flock are then moved to areas of high color and appearance probabil-
ity. While a traditional FoF is agnostic of the object it is tracking except for color and
motion consistency, this improved FoF has knowledge of the object's appearance.
To obtain this appearance-based probability, we trained a Viola-Jones-based detec-
tor [20] (VJ) on hands in arbitrary postures and then attempt hand detections at similar
scales as the tracked object. Yet, this achieves only poor performance: hands are too var-
ied in appearance for reliable detection. The main innovation of this paper is a method
that utilizes incomplete detections to make predictions about the presence of a hand.
Incomplete detections are areas that successfully passed some but not all VJ cascades.
Scores obtained from incomplete detections are integrated over scale and space to yield
a prior probability per pixel (see Fig. 1). This image cue is largely orthogonal to color
and optical flow, hence providing new information onto which the tracking decision can
be based.
The paper is organized as follows. We first discuss the background against which this
research has been conducted, including related work. We then present the method to
calculate the prior in detail and explain how it is built into FoF tracking. The following
experiment section describes the test data and evaluation method, before we present and
discuss the results in the last two sections.
2
Background
We briefly discuss related work on object tracking, the traditional Flock of Features
(FoF) approach and methods for incomplete detections, or object priors.
2.1
Object Tracking
Rigid objects with a known shape can be tracked reliably before arbitrary backgrounds
in grey-level images [1,7]. However, when the object's shape varies vastly such as with
gesturing hands, most approaches resort to shape-free color information or background
differencing [4,9,15]. Yet these approaches rely, for example, on a stationary camera
and are not robust to related uni modal failure modes. Multi -cue methods, on the other
Search WWH ::




Custom Search