An Appearance-Based Prior for Hand Tracking - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

3.3

Per-Pixel Object Probability

VJ object detectors require that the actual classifier is scanned across the image, testing

rectangular areas at increments in x- and y-position. To detect an object at different

scales, either the image needs to be down-scaled or the detector upscaled, typically by

10-25%. After scanning, our modified detector returns one “score image” per scale, its

resolution equal to the number of area tests in the x- and y-directions.

An object will typically get detected at multiple adjacent positions and frequently

also at nearby scales. The traditional VJ detector heuristically post-processes these de-

tections to combine them into one. Similarly, we devised a way to combine incomplete,

rather than binary, adjacent detections. This has the effect of outlier removal and em-

phasis on actual detections. To this end, every score image is smoothed with a Gaussian.

Next, a grey-level morphology (dilation) spreads the point-wise detections to cover a

slightly larger area. The combination of the Gaussian covariance, the size of the mor-

phological structuring element, and the number of dilation repetitions should roughly

correspond to the size of the object of interest (the hand) within the rectangular VJ area.

We chose two configurations, one keeping the point detections rather confined ( O t ),

and one “spreading” them out further ( O s , see Fig. 3 and Sec. 5). The resulting point-

symmetrical spread is appropriate for hands. Other objects, such as pedestrians, likely

benefit from a spread pattern in the shape of the object.

Thereafter, every value is squared to put more emphasis on almost-detections and to

devalue not-even-close incomplete detections (remember that the score value is between

zero and one). The score images are generally no larger than 160x120 pixels, hence,

these are rather quick operations.

Finally, every score image O s/t is upscaled to the size of the original video frame

and combined with the score images at all resolutions to yield P s/t . Since we have fairly

good knowledge of the expected scale of the hand in our application, we can constrain

the search to such scales and avoid combining scores from much-too-large and much-

too-small scales. The desired operation emphasizes detections at the same location in

nearby scales, without penalizing detections only at a single scale. Hence, we add the

scores, capping them at one. (A max operator would not emphasize, and multiplication

Fig. 3. Small vs. large “spreading” of incomplete detections

Search WWH ::

Custom Search

Home