Information Technology Reference
In-Depth Information
burden and requires accurate calibration parameters of each one of the cameras
observing the scene. Also, the computation of the 3D visual hull requires at least
the silhouettes from 2 different camera viewpoints.
This work presents a novel method for the recognition of human actions using
multiple cameras at the feature fusion level but without explicitly reconstructing
the visual hull or other 3D descriptor. Experimental results reported in the liter-
ature [19,13] have shown that visual hulls can be projected into low dimensional
manifolds where most of their variance is preserved. Moreover, a silhouette is the
projection of a visual hull into the camera plane, and different works [20] have
also reported that they can be parametrized into low dimensional manifolds. The
aim of our method is to find a set of projection functions, one for each camera,
that project the corresponding silhouettes into a common low dimensional man-
ifold. We think that the representation of the silhouettes into that common low
dimensional manifold would be equivalent to the low dimensional representation
fo the visual hull, so similar results can be achieved in human action recognition.
Probabilistic continuous latent variable models provide a framework for man-
ifold learning where low and high dimensional representations are related via
the factorization of their joint probability distribution. We test the usage of
the Probabilistic Canonical Correlation Analysis (PCCA) model [2] to learn the
projections of the features observed at the different cameras into a subspace
that maximizes the correlation between their components. The representation
of the observed features into that subspace is then used for action sequence
classification.
Paper is organized as follows: section 2 presents how the proposed system
is structured; 3 reviews the Canonical Correlation Analysis model; section 4
describes the sequence classifier that is going to be used to test the system;
section 5 shows some experimental validation of the method; finally, section 4
discusses the conclusions and future lines of the work.
2Sy emO rvew
The architecture of the proposed system is shown on figure 1. The images
grabbed by the C different cameras observing the scene are independently pro-
cessed to extract a sequence of action descriptors X c = x 1 c,...,x Tc ,1
C ,
and T is the total number of frames grabbed. The C sequences of actions de-
scriptors extracted are fused projecting them into a common subspace to give
a sequence of common action descriptors Z = z 1 ,...,z T , z t = F ( x t 1 ,...,x tC ).
Finally each sequence is introduced into an action classifier to make the decision
on the action being performed in the sequence.
c
3 Canonical Correlation Analysis
Canonical Correlation Analysis is the method we use for the fusion of action
descriptors. In the next paragraphs we give an overview of the classical and the
probabilistic formulation.
 
Search WWH ::




Custom Search