Information Technology Reference
In-Depth Information
continuously investigated and found large amount of real applications. In general,
one can adopt two main strategies for the recovery of 3-D structure from passive
image data, structure from motion (SFM) and stereo vision [13], each of which
relies on the acquisition of video data of the same scene data from different view-
points. In the case of SFM, one can also recover the ego-motion of the sensor with
respect to a world coordinate system, performing 3-D scene reconstruction and nav-
igation [38, 35]. This implies knowledge of corresponding locations in the several
images. Lepetit and Fua [17] have described the general principles of feature de-
tection, tracking and 3-D reconstruction, and Oliensis [22] gave an earlier detailed
critique of the comparative strengths and weaknesses of several, well-documented
approaches to SFM. The majority of existing methods for feature tracking use
frame-to-frame prediction models, based for example on Kalman filter [14, 20],
particle filtering [25], and optimisation based approaches [33, 21, 34]. One of the
numerous examples is the MonoSLAM system developed by Davison et al. [8],
who utilised a probabilistic feature-based map that represents a snapshot of the cur-
rent estimates of the state of the camera and the overall feature points. This feature
map was initialised at system start-up and updated by the extended Kalman filter
that considered frame-by-frame computation. The state estimates of the camera and
feature points were also updated during camera motion.
In spite of their success in certain applications, these established approaches do
not take into account the long-term history of the camera motion. In contrast, we
track features and recover ego-motion and 3-D structure from a temporal image
sequence acquired by a single camera mounted on a moving pedestrian. Our con-
tribution is to show that the use of an explicit longer term, non-linear human gait
model is more efficient in this case. Fewer features are lost and the processing time
per frame is lessened as either the search window or the frame rate can be reduced.
Our work was motivated by the study reported in [19], where Molton et al. used
a robotic system to make measurements on the gait of a walking person, while a
digital compass and an inclinometer were used to record rotation. An iterated and
extended Kalman filter was used in their work to initialise the wavelet parameter
estimates, then running across the whole period of activity. In our work, the motion
is computed directly from the video data, and as already stated, the emphasis is on
the use of a longer term model to increase algorithmic efficiency. In what follows,
we use the term “ego-motion" to refer to both frame-to-frame and longer-term pe-
riodic motion. The expression “camera transformation" refers to the ego-motion of
the camera between any two frames, and the expression “gait model" refers to the
longer-term ego-motion of the camera over many frames.
In Section 2 we give an overview of our approach, which has two phases, ini-
tialisation and continuous tracking. We then expand on the key components of our
strategy in Sections 3 and 4. In Section 3 we discuss the GLS method to recover the
scene geometry and ego-motion within Phase 1, effectively finding structure from
motion (SFM) [36]. In Section 4 we discuss the formulation of the dynamic gait
model within the MAP-EM framework of Phase 2 to continually perform SFM with
improved efficiency while periodically updating the gait parameters. In Section 5,
 
Search WWH ::




Custom Search