An Appearance-Based Prior for Hand Tracking - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

would be low-score dominated.) This yields an estimate P s/t (

x, y

)

of a pixel belonging

to the hand, irrespective of scale:

O s =

D i

S 7 × 7 ) 2

γ s (

⊗

G 3 × 3 ,σ ⊕

(3)

O t =

D i

S 5 × 5 ) 4

γ t (

⊗

G 3 × 3 ,σ ⊕

S 5 × 5 ⊕

(4)

O s/t )

P s/t =

min

i ↑

(5)

D i are the incomplete detections at scale i (skipping the image coordinates

(

x, y

)

), S is

an elliptical structuring element for dilation (

⊕

), G is the Gaussian, γ a constant factor,

and

↑

is the upscale operator.

3.4

Multimodal Integration

The hand appearance probability calculated as described above, together with the grey-

level optical flow with flocking constraint from the feature tracking, and the particular

hand color learned at initial hand detection make for three largely orthogonal image

cues that need to be combined into one tracking result. We first combine the color

and appearance cues into a joint probability map which is then used to aid the feature

tracking.

Preliminary experiments with the joint probability of color and appearance (using

their minimum, maximum, weighted average, and product) found that treating the two

probabilities as statistically independent distributions and multiplying them yielded the

best results: P hand =

P color P s/t .

For fusion with the tracking information, we follow the same approach as with the

original FoF. If a feature is “lost” between frames because the image mark it tracked

disappeared or if it violated the flocking constraints, it is moved to a random area of

high appearance color probability ( p>

). If this is not possible without repeated

violation of the flocking conditions, it is chosen randomly. Hence, this improved FoF

can take advantage of the object's appearance by relocating features to pixels that “look

like hand.” The result is an improvement to the feature re-localization method as the

previous approach could not distinguish between the object of interest and background

artifacts.

As with the original FoF, this method leads to a natural multimodal integration, com-

bining cues from feature movement based on grey-level image texture with cues from

texture-less skin color probability and object-specific texture. Their relative contribu-

tion is determined by the desired match quality for features between frames. If this

threshold is low, features are relocated more frequently, raising the importance of the

color and appearance modalities, and vice versa.

Experiments

We compared the performance of the traditional FoF tracking to two parameterizations

of FoF with appearance-based prior. We also investigated whether the appearance cue

could replace the color cue entirely. The features and color information were initialized

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home