Information Technology Reference
In-Depth Information
3.3
Per-Pixel Object Probability
VJ object detectors require that the actual classifier is scanned across the image, testing
rectangular areas at increments in x- and y-position. To detect an object at different
scales, either the image needs to be down-scaled or the detector upscaled, typically by
10-25%. After scanning, our modified detector returns one “score image” per scale, its
resolution equal to the number of area tests in the x- and y-directions.
An object will typically get detected at multiple adjacent positions and frequently
also at nearby scales. The traditional VJ detector heuristically post-processes these de-
tections to combine them into one. Similarly, we devised a way to combine incomplete,
rather than binary, adjacent detections. This has the effect of outlier removal and em-
phasis on actual detections. To this end, every score image is smoothed with a Gaussian.
Next, a grey-level morphology (dilation) spreads the point-wise detections to cover a
slightly larger area. The combination of the Gaussian covariance, the size of the mor-
phological structuring element, and the number of dilation repetitions should roughly
correspond to the size of the object of interest (the hand) within the rectangular VJ area.
We chose two configurations, one keeping the point detections rather confined ( O t ),
and one “spreading” them out further ( O s , see Fig. 3 and Sec. 5). The resulting point-
symmetrical spread is appropriate for hands. Other objects, such as pedestrians, likely
benefit from a spread pattern in the shape of the object.
Thereafter, every value is squared to put more emphasis on almost-detections and to
devalue not-even-close incomplete detections (remember that the score value is between
zero and one). The score images are generally no larger than 160x120 pixels, hence,
these are rather quick operations.
Finally, every score image O s/t is upscaled to the size of the original video frame
and combined with the score images at all resolutions to yield P s/t . Since we have fairly
good knowledge of the expected scale of the hand in our application, we can constrain
the search to such scales and avoid combining scores from much-too-large and much-
too-small scales. The desired operation emphasizes detections at the same location in
nearby scales, without penalizing detections only at a single scale. Hence, we add the
scores, capping them at one. (A max operator would not emphasize, and multiplication
Fig. 3. Small vs. large “spreading” of incomplete detections
Search WWH ::




Custom Search