Multicamera Action Recognition with Canonical Correlation Analysis and Discriminative Sequence Classification - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

burden and requires accurate calibration parameters of each one of the cameras

observing the scene. Also, the computation of the 3D visual hull requires at least

the silhouettes from 2 different camera viewpoints.

This work presents a novel method for the recognition of human actions using

multiple cameras at the feature fusion level but without explicitly reconstructing

the visual hull or other 3D descriptor. Experimental results reported in the liter-

ature [19,13] have shown that visual hulls can be projected into low dimensional

manifolds where most of their variance is preserved. Moreover, a silhouette is the

projection of a visual hull into the camera plane, and different works [20] have

also reported that they can be parametrized into low dimensional manifolds. The

aim of our method is to find a set of projection functions, one for each camera,

that project the corresponding silhouettes into a common low dimensional man-

ifold. We think that the representation of the silhouettes into that common low

dimensional manifold would be equivalent to the low dimensional representation

fo the visual hull, so similar results can be achieved in human action recognition.

Probabilistic continuous latent variable models provide a framework for man-

ifold learning where low and high dimensional representations are related via

the factorization of their joint probability distribution. We test the usage of

the Probabilistic Canonical Correlation Analysis (PCCA) model [2] to learn the

projections of the features observed at the different cameras into a subspace

that maximizes the correlation between their components. The representation

of the observed features into that subspace is then used for action sequence

classification.

Paper is organized as follows: section 2 presents how the proposed system

is structured; 3 reviews the Canonical Correlation Analysis model; section 4

describes the sequence classifier that is going to be used to test the system;

section 5 shows some experimental validation of the method; finally, section 4

discusses the conclusions and future lines of the work.

2Sy emO rvew

The architecture of the proposed system is shown on figure 1. The images

grabbed by the C different cameras observing the scene are independently pro-

cessed to extract a sequence of action descriptors X c = x 1 c,...,x Tc ,1

C ,

and T is the total number of frames grabbed. The C sequences of actions de-

scriptors extracted are fused projecting them into a common subspace to give

a sequence of common action descriptors Z = z 1 ,...,z T , z t = F ( x t 1 ,...,x tC ).

Finally each sequence is introduced into an action classifier to make the decision

on the action being performed in the sequence.

≤

c

≤

3 Canonical Correlation Analysis

Canonical Correlation Analysis is the method we use for the fusion of action

descriptors. In the next paragraphs we give an overview of the classical and the

probabilistic formulation.

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home