Gait Analysis and Human Motion Tracking - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

continuously investigated and found large amount of real applications. In general,

one can adopt two main strategies for the recovery of 3-D structure from passive

image data, structure from motion (SFM) and stereo vision [13], each of which

relies on the acquisition of video data of the same scene data from different view-

points. In the case of SFM, one can also recover the ego-motion of the sensor with

respect to a world coordinate system, performing 3-D scene reconstruction and nav-

igation [38, 35]. This implies knowledge of corresponding locations in the several

images. Lepetit and Fua [17] have described the general principles of feature de-

tection, tracking and 3-D reconstruction, and Oliensis [22] gave an earlier detailed

critique of the comparative strengths and weaknesses of several, well-documented

approaches to SFM. The majority of existing methods for feature tracking use

frame-to-frame prediction models, based for example on Kalman filter [14, 20],

particle filtering [25], and optimisation based approaches [33, 21, 34]. One of the

numerous examples is the MonoSLAM system developed by Davison et al. [8],

who utilised a probabilistic feature-based map that represents a snapshot of the cur-

rent estimates of the state of the camera and the overall feature points. This feature

map was initialised at system start-up and updated by the extended Kalman filter

that considered frame-by-frame computation. The state estimates of the camera and

feature points were also updated during camera motion.

In spite of their success in certain applications, these established approaches do

not take into account the long-term history of the camera motion. In contrast, we

track features and recover ego-motion and 3-D structure from a temporal image

sequence acquired by a single camera mounted on a moving pedestrian. Our con-

tribution is to show that the use of an explicit longer term, non-linear human gait

model is more efficient in this case. Fewer features are lost and the processing time

per frame is lessened as either the search window or the frame rate can be reduced.

Our work was motivated by the study reported in [19], where Molton et al. used

a robotic system to make measurements on the gait of a walking person, while a

digital compass and an inclinometer were used to record rotation. An iterated and

extended Kalman filter was used in their work to initialise the wavelet parameter

estimates, then running across the whole period of activity. In our work, the motion

is computed directly from the video data, and as already stated, the emphasis is on

the use of a longer term model to increase algorithmic efficiency. In what follows,

we use the term “ego-motion" to refer to both frame-to-frame and longer-term pe-

riodic motion. The expression “camera transformation" refers to the ego-motion of

the camera between any two frames, and the expression “gait model" refers to the

longer-term ego-motion of the camera over many frames.

In Section 2 we give an overview of our approach, which has two phases, ini-

tialisation and continuous tracking. We then expand on the key components of our

strategy in Sections 3 and 4. In Section 3 we discuss the GLS method to recover the

scene geometry and ego-motion within Phase 1, effectively finding structure from

motion (SFM) [36]. In Section 4 we discuss the formulation of the dynamic gait

model within the MAP-EM framework of Phase 2 to continually perform SFM with

improved efficiency while periodically updating the gait parameters. In Section 5,

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home