Information Technology Reference
In-Depth Information
Multicamera Action Recognition with Canonical
Correlation Analysis and Discriminative
Sequence Classification
Rodrigo Cilla, Miguel A. Patricio, Antonio Berlanga, and Jose M. Molina
Computer Science Department. Universidad Carlos III de Madrid
Avda. de la Universidad Carlos III, 22. 28270 Colmenarejo, Madrid, Spain
{ rcilla,mpatrici } @inf.uc3m.es,
{ berlanga,molina } @ia.uc3m.es
Abstract. This paper presents a feature fusion approach to the recog-
nition of human actions from multiple cameras that avoids the computa-
tion of the 3D visual hull. Action descriptors are extracted for each one
of the camera views available and projected into a common subspace
that maximizes the correlation between each one of the components of
the projections. That common subspace is learned using Probabilistic
Canonical Correlation Analysis. The action classification is made in that
subspace using a discriminative classifier. Results of the proposed method
are shown for the classification of the IXMAS dataset.
1
Introduction
The recognition of human actions has received an increasing attention by the
computer vision community during the last years [10]. One of the current trends
in the field is how to eciently combine the perceptions grabbed from different
viewpoints in order to create more robust action recognition systems. This way
the system can cover wider scenes, being able to deal with the possible occlusions
caused by walls and furniture that would make the recognition from a single view
very dicult if not impossible.
Although there has been different proposals of human action recognition sys-
tems at the different sensor fusion levels proposed by Dasarathy [5] as [4,15] in
the decision-in decision-out level or [22,12] at the feature-in decision-out level,
the most successful approaches have been defined at the feature-in feature-out
level. These approaches extract human silhouettes from the different cameras
using for example background subtraction [16], and then reconstruct the 3D vi-
sual hull of the human [9] as the feature to be used for the recognition. This
way, Weinland et al. [21] have proposed the Motion History Volumes (MHV) as
an extension of the popular Motion History Image (MHI) [3] to 3D. Action clas-
sification is then made using Fourier analysis of the MHV. Peng et al [13] have
performed multilinear analysis of the voxels in the visual hull. Turaga et al. [19]
have studied the visual hulls using Stiefel and Grassman manifolds, reporting
the best results for action recognition in 3D until date. The main drawback of
these methods is that 3D visual hull reconstruction has a high computational
 
Search WWH ::




Custom Search