Information Technology Reference
In-Depth Information
in the same fashion for all configurations, through automatic detection of an “initial-
ization” posture (see [8]). We did not compare against the CamShift tracker [2] since
superior performance of traditional FoF tracking was shown already [8].
4.1
Video Sequences
We recorded a total of 16,042 frames of video footage in 13 sequences, over 10 minutes
in total, including five of the sequences from [8]. Each sequence follows the motions of
the right hand of one of three people, some filmed from the performer's point of view,
some from an observer's point of view. The videos were shot in an office, a lab, and a
hallway as well as at various outdoor locations in front of walkways, vegetation, walls
and other common scenes. The videos were recorded with a hand-held DV camcorder,
a webcam, and a digital still camera in video mode, then copied to our computer and
processed in real-time. A sample video (excerpts from sequence 12) is available from
our web site, 2 showing FoF tracking (big and little dots), the color model backprojection
(in white) and the appearance prior. The appearance-based probability is shown in cyan,
overlaid over a red edge image to improve viewing. (The edges were not used for any
calculation.)
5R su s
Following the FoF evaluation [8], we consider tracking to be lost when the mean loca-
tion (the big dot) is no longer on the hand. The wrist is not considered part of the hand.
The tracking for the sequence was stopped then, even though the hand might coinciden-
tally “catch” the tracked feature points again. Since the average feature location cannot
be guaranteed to be on the center of the hand or any other particular part, measuring
the distance between the tracked location and some ground truth data is not an accu-
rate measure for determining tracking loss. We thus visually inspected and manually
annotated every video sequence.
Fig. 4 shows the time until tracking was lost, normalized to the length of the video
sequence. The rightmost bars are the average over all sequences. The appearance-added
FoF (with larger spread, see below) tracks the hand on average 13.9% longer than the
original FoF. As expected, appearance-based FoF can handle some cases where both
the flocking and the color modalities break down. Fig. 2 shows two screen shots from
sequence 12 where the hand is in front of a walkway and color segmentation does
not yield a good result. The hand appearance, however, is visibly distinct from the
background and our method produces a high probability for hand pixels. LK feature
tracking fails shortly after, and only re-localization on high appearance probabilities
allows the hand tracking to continue successfully.
5.1
Spreading Incomplete Detections
Incomplete detections are post-processed as explained in Sec. 3.3. We experimented
with two sets of parameters, shown in Eq. 3 ( O s , larger spread) and 4 ( O t smaller
2 http://www.movesinstitute.org/%7Ekolsch/paper241Video.wmv
 
Search WWH ::




Custom Search