Graphics Reference
In-Depth Information
It is shown by Hahn et al. ( 2007 ) that the MOCCD algorithm yields significantly
more accurate and robust tracking results than the three-dimensional extension of
the contour tracking approach of Blake and Isard ( 1998 ) and the three-dimensional
active contour technique of d'Angelo et al. ( 2004 ), especially because the latter two
tracking systems tend to lose the object after a few tens of frames.
A direct comparison is possible with the monocular approach of Schmidt et al.
( 2006 ). Their system is evaluated by Schmidt ( 2009 ) on an image sequence which
is similar to our sequence 1; it was acquired with the same camera system as the one
used in this section and shows the same test person and the same background at the
same distance. The only difference is that Schmidt ( 2009 ) uses the original colour
images. The system is initialised manually and obtains an average positional error of
210 mm on the monocular colour image sequence, which is about three times larger
than the errors of our configurations 4 and 5. Not surprisingly, the positional error
of the monocular system is most pronounced in the direction parallel to the depth
axis, where it amounts to 160 mm on average. Furthermore, the system is unable to
detect slight movements of the hand, e.g. those that occur when tightening a screw.
Further quantitative comparisons are only possible with systems using long-
baseline multiple-camera setups. Rosenhahn et al. ( 2005 ) compare their tracking
results obtained with a four-camera setup with ground truth data recorded by a
marker-based system relying on eight cameras. In their scenario, the test persons
wear tight-fitting clothes and the body model is adapted to each individual test per-
son. The obtained accuracies are much higher than those achieved in this section.
The average error of the elbow joint angle amounts to only 1 . 3 , which is mainly due
to the detailed modelling of the test person, the fairly simple background, and the
long-baseline multiple camera setup which strongly alleviates the problem of occlu-
sions. Similar accuracies are obtained in the presence of more complex backgrounds
by Rosenhahn et al. ( 2008a , 2008b ), who also rely on individually calibrated body
models.
Mündermann et al. ( 2008 ) examine camera configurations with 4, 8, 16, 32,
and 64 cameras. The average positional error of their system corresponds to 10-
30 mm, where the accuracy increases with an increasing number of cameras. Their
positional accuracies are again higher than those achieved in this section, which is
mainly due to our restriction to the small-baseline camera system required by our
application scenario.
A three-camera setup with baselines of several metres is used by Hofmann and
Gavrila ( 2009 ), who evaluate their system on 12 test persons and obtain an average
positional accuracy of 100-130 mm in a fairly complex real-world environment.
Their results are comparable to those obtained in this section, although the baseline
of their camera setup is wider by a factor of 10-30. On the other hand, the distance
of the test persons to the camera corresponds to approximately 10-15 m, which is
higher by about a factor of three than in our scenario.
On the HumanEva data set (Sigal and Black, 2006 ), an average positional error of
30-40 mm with a standard deviation of about 5 mm is obtained by Gall et al. ( 2009 ).
In particular, the small standard deviation illustrates the high robustness of their
approach. However, similar to Rosenhahn et al. ( 2005 , 2008a , 2008b ), their system
Search WWH ::




Custom Search