Information Technology Reference
In-Depth Information
would be low-score dominated.) This yields an estimate P s/t (
x, y
)
of a pixel belonging
to the hand, irrespective of scale:
O s =
D i
S 7 × 7 ) 2
γ s (
G 3 × 3
(3)
O t =
D i
S 5 × 5 ) 4
γ t (
G 3 × 3
S 5 × 5
(4)
,
O s/t )
P s/t =
min
(1
i
(5)
D i are the incomplete detections at scale i (skipping the image coordinates
(
x, y
)
), S is
an elliptical structuring element for dilation (
), G is the Gaussian, γ a constant factor,
and
is the upscale operator.
3.4
Multimodal Integration
The hand appearance probability calculated as described above, together with the grey-
level optical flow with flocking constraint from the feature tracking, and the particular
hand color learned at initial hand detection make for three largely orthogonal image
cues that need to be combined into one tracking result. We first combine the color
and appearance cues into a joint probability map which is then used to aid the feature
tracking.
Preliminary experiments with the joint probability of color and appearance (using
their minimum, maximum, weighted average, and product) found that treating the two
probabilities as statistically independent distributions and multiplying them yielded the
best results: P hand =
P color P s/t .
For fusion with the tracking information, we follow the same approach as with the
original FoF. If a feature is “lost” between frames because the image mark it tracked
disappeared or if it violated the flocking constraints, it is moved to a random area of
high appearance color probability ( p>
). If this is not possible without repeated
violation of the flocking conditions, it is chosen randomly. Hence, this improved FoF
can take advantage of the object's appearance by relocating features to pixels that “look
like hand.” The result is an improvement to the feature re-localization method as the
previous approach could not distinguish between the object of interest and background
artifacts.
As with the original FoF, this method leads to a natural multimodal integration, com-
bining cues from feature movement based on grey-level image texture with cues from
texture-less skin color probability and object-specific texture. Their relative contribu-
tion is determined by the desired match quality for features between frames. If this
threshold is low, features are relocated more frequently, raising the importance of the
color and appearance modalities, and vice versa.
0
.
5
4
Experiments
We compared the performance of the traditional FoF tracking to two parameterizations
of FoF with appearance-based prior. We also investigated whether the appearance cue
could replace the color cue entirely. The features and color information were initialized
Search WWH ::




Custom Search