An Appearance-Based Prior for Hand Tracking - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

explictly. Here, we introduce a probabilistic appearance-based model that helps con-

strain the feature locations without placing restrictions on the possible hand configura-

tions and without incurring extraneous computational costs.

2.3

Object Priors

Whereas traditional object detection methods make a binary decision about the presence

of the object of interest, our goal is to estimate the probability for the object and to delay

the classification decision. Also, instead of a decision for rectangular areas, we need to

know the probability per pixel. Lastly, a test area implies a hypothesis about the object's

scale, yet we would like an estimate irrespective of scale.

In principle, many object detectors are capable of reporting a score instead of a

thresholded classification. Take a PCA-based [19] or wavelet-based [12,14] object de-

scriptor, for example: it measures the distance of the observation from the training mean

in image- or feature space. A method is particularly suitable for articulated objects if the

different appearances are not aggregated and reduced to a mean. Instead, it must be able

to learn dissimilar appearances. For describing dissimilar objects, shape as prior prob-

ability has been applied successfully to segmentation and tracking, for example, in an

application of the powerful level-set methods [5]. However, appearance-based methods

are likely to outperform shape-based methods for natural objects. Yet, appearance-based

priors are only recently becoming a popular alternative. Most notable are the excellent

tracking and segmentation results of Leibe and Schiele et al. [10].

3M thod

Our method makes three improvements to FoF tracking. First, a posture-independent

hand detector is applied to the image at multiple scales, reporting unclassified scores

for hand presence. Second, a per-pixel hand probability is calculated from these multi-

scale scores of image areas. Third, this hand prior is integrated into the FoF tracking as

third image cue and observation modality. This section details each of these steps.

3.1

Hand Scores

If hands could be detected reliably in any posture, tracking by detection would be vi-

able. However, since hands are too varied in appearance, we avoid making the binary

classification decision and instead obtain a probabilistic score that directly aids track-

ing. To calculate a score for an image area to contain an object of interest (at a certain

scale and the proper position inside the area), we chose to modify Viola and Jones' de-

tection approach [20] because a) it is very fast, permitting real-time image scanning, b)

it is inherently based on local image features, benefitting articulated objects (detect the

fingers, not the hand), c) its iterative bootstrapping training method is naturally suited

to increasing levels of confidence for object presence, and d) we had prior experience

with this method. We are currently evaluating other approaches to calculate this score.

The typical VJ cascade is built with AdaBoost [6] training and consists of

N stages, each of which is a linear combination of M weak classifiers. Weak classifiers

h t (

x

) ∈{ 0

,

1 }

make their decision based on intensity comparisons between rectangular

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home