Multicamera Action Recognition with Canonical Correlation Analysis and Discriminative Sequence Classification - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

Multicamera Action Recognition with Canonical

Correlation Analysis and Discriminative

Sequence Classification

Rodrigo Cilla, Miguel A. Patricio, Antonio Berlanga, and Jose M. Molina

Computer Science Department. Universidad Carlos III de Madrid

Avda. de la Universidad Carlos III, 22. 28270 Colmenarejo, Madrid, Spain

{ rcilla,mpatrici } @inf.uc3m.es,

{ berlanga,molina } @ia.uc3m.es

Abstract. This paper presents a feature fusion approach to the recog-

nition of human actions from multiple cameras that avoids the computa-

tion of the 3D visual hull. Action descriptors are extracted for each one

of the camera views available and projected into a common subspace

that maximizes the correlation between each one of the components of

the projections. That common subspace is learned using Probabilistic

Canonical Correlation Analysis. The action classification is made in that

subspace using a discriminative classifier. Results of the proposed method

are shown for the classification of the IXMAS dataset.

1

Introduction

The recognition of human actions has received an increasing attention by the

computer vision community during the last years [10]. One of the current trends

in the field is how to eciently combine the perceptions grabbed from different

viewpoints in order to create more robust action recognition systems. This way

the system can cover wider scenes, being able to deal with the possible occlusions

caused by walls and furniture that would make the recognition from a single view

very dicult if not impossible.

Although there has been different proposals of human action recognition sys-

tems at the different sensor fusion levels proposed by Dasarathy [5] as [4,15] in

the decision-in decision-out level or [22,12] at the feature-in decision-out level,

the most successful approaches have been defined at the feature-in feature-out

level. These approaches extract human silhouettes from the different cameras

using for example background subtraction [16], and then reconstruct the 3D vi-

sual hull of the human [9] as the feature to be used for the recognition. This

way, Weinland et al. [21] have proposed the Motion History Volumes (MHV) as

an extension of the popular Motion History Image (MHI) [3] to 3D. Action clas-

sification is then made using Fourier analysis of the MHV. Peng et al [13] have

performed multilinear analysis of the voxels in the visual hull. Turaga et al. [19]

have studied the visual hulls using Stiefel and Grassman manifolds, reporting

the best results for action recognition in 3D until date. The main drawback of

these methods is that 3D visual hull reconstruction has a high computational

Search WWH ::

Custom Search

Home