Information Technology Reference
In-Depth Information
explictly. Here, we introduce a probabilistic appearance-based model that helps con-
strain the feature locations without placing restrictions on the possible hand configura-
tions and without incurring extraneous computational costs.
2.3
Object Priors
Whereas traditional object detection methods make a binary decision about the presence
of the object of interest, our goal is to estimate the probability for the object and to delay
the classification decision. Also, instead of a decision for rectangular areas, we need to
know the probability per pixel. Lastly, a test area implies a hypothesis about the object's
scale, yet we would like an estimate irrespective of scale.
In principle, many object detectors are capable of reporting a score instead of a
thresholded classification. Take a PCA-based [19] or wavelet-based [12,14] object de-
scriptor, for example: it measures the distance of the observation from the training mean
in image- or feature space. A method is particularly suitable for articulated objects if the
different appearances are not aggregated and reduced to a mean. Instead, it must be able
to learn dissimilar appearances. For describing dissimilar objects, shape as prior prob-
ability has been applied successfully to segmentation and tracking, for example, in an
application of the powerful level-set methods [5]. However, appearance-based methods
are likely to outperform shape-based methods for natural objects. Yet, appearance-based
priors are only recently becoming a popular alternative. Most notable are the excellent
tracking and segmentation results of Leibe and Schiele et al. [10].
3M thod
Our method makes three improvements to FoF tracking. First, a posture-independent
hand detector is applied to the image at multiple scales, reporting unclassified scores
for hand presence. Second, a per-pixel hand probability is calculated from these multi-
scale scores of image areas. Third, this hand prior is integrated into the FoF tracking as
third image cue and observation modality. This section details each of these steps.
3.1
Hand Scores
If hands could be detected reliably in any posture, tracking by detection would be vi-
able. However, since hands are too varied in appearance, we avoid making the binary
classification decision and instead obtain a probabilistic score that directly aids track-
ing. To calculate a score for an image area to contain an object of interest (at a certain
scale and the proper position inside the area), we chose to modify Viola and Jones' de-
tection approach [20] because a) it is very fast, permitting real-time image scanning, b)
it is inherently based on local image features, benefitting articulated objects (detect the
fingers, not the hand), c) its iterative bootstrapping training method is naturally suited
to increasing levels of confidence for object presence, and d) we had prior experience
with this method. We are currently evaluating other approaches to calculate this score.
The typical VJ cascade is built with AdaBoost [6] training and consists of
N stages, each of which is a linear combination of M weak classifiers. Weak classifiers
h t (
x
) ∈{ 0
,
1 }
make their decision based on intensity comparisons between rectangular
 
Search WWH ::




Custom Search