An Appearance-Based Prior for Hand Tracking - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

Fig. 1. The appearance-based prior for select hand images

probability that an area's appearance could be attributed to a hand. Features that have

strayed from the flock are then moved to areas of high color and appearance probabil-

ity. While a traditional FoF is agnostic of the object it is tracking except for color and

motion consistency, this improved FoF has knowledge of the object's appearance.

To obtain this appearance-based probability, we trained a Viola-Jones-based detec-

tor [20] (VJ) on hands in arbitrary postures and then attempt hand detections at similar

scales as the tracked object. Yet, this achieves only poor performance: hands are too var-

ied in appearance for reliable detection. The main innovation of this paper is a method

that utilizes incomplete detections to make predictions about the presence of a hand.

Incomplete detections are areas that successfully passed some but not all VJ cascades.

Scores obtained from incomplete detections are integrated over scale and space to yield

a prior probability per pixel (see Fig. 1). This image cue is largely orthogonal to color

and optical flow, hence providing new information onto which the tracking decision can

be based.

The paper is organized as follows. We first discuss the background against which this

research has been conducted, including related work. We then present the method to

calculate the prior in detail and explain how it is built into FoF tracking. The following

experiment section describes the test data and evaluation method, before we present and

discuss the results in the last two sections.

2

Background

We briefly discuss related work on object tracking, the traditional Flock of Features

(FoF) approach and methods for incomplete detections, or object priors.

2.1

Object Tracking

Rigid objects with a known shape can be tracked reliably before arbitrary backgrounds

in grey-level images [1,7]. However, when the object's shape varies vastly such as with

gesturing hands, most approaches resort to shape-free color information or background

differencing [4,9,15]. Yet these approaches rely, for example, on a stationary camera

and are not robust to related uni modal failure modes. Multi -cue methods, on the other

Search WWH ::

Custom Search

Home