Compressive Sensing for Vision - Sparse Representations and Compressive Sensing for Imaging and Vision

Digital Signal Processing Reference

In-Depth Information

4.1.4

Compressive Sensing for Multi-View Tracking

Another direct application of CS to a data-rich tracking problem is presented by

[116]. Specifically, a method for using multiple sensors to perform multi-view

tracking employing a coding scheme based on compressive sensing is developed.

Assuming that the observed data contains no background component (this could be

realized, e.g., by preprocessing using any of the background subtraction techniques

previously discussed), the method uses known information regarding the sensor

geometry to facilitate a common data encoding scheme based on CS. After data from

each camera is received at a central processing station, it is fused via CS decoding

and the resulting image or three dimensional grid can be used for tracking.

The first case considered is one where all objects of interest exist in a known

ground plane. It is assumed that the geometric transformation between it and each

sensor plane is known. That is, if there are C cameras, then the homographies

{

j = 1

in the j th

H j }

are known. The relationship between coordinates

(

)

image

and the corresponding ground plane coordinates

(

)

is determined by H j as

⎡

⎤

⎡

⎤

⎣

⎦ ∼

⎣

⎦ ,

H j

(4.10)

where the coordinates are written in accordance with their homogeneous represen-

tation. Since H j can vary widely across the set of cameras due to varying viewpoint,

an encoding scheme designed to achieve a common data representation is presented.

First, the ground plane is sampled, yielding a discrete set of coordinates

i =

{ (

x i ,

y i ) }

1 .

An occupancy vector, x , is defined over these coordinates, where x

1if

foreground is present at the corresponding coordinates and is 0 otherwise. For each

camera's observed foreground image in the set

(

j =

1 , an occupancy vector y j is

{

I j }

formed as y j (

I j (

u i ,

v i )

,where

(

u i ,

v i )

are the (rounded) image plane coordinates

obtained via ( 4.10 ). Thus, y j =

corresponding to

e j ,where e j represents

any error due to the coordinate rounding and other noise. Figure 4.3 illustrates the

physical configuration of the system.

Noting that x is often sparse, the camera data

(

x i ,

y i )

y j }

j = 1

{

is encoded using

{ Φ

}

compressive sensing. First, C measurement matrices

1 of equal dimension

are formed according to a construction that affords them the RIP of appropriate

order for x . Next, the camera data is projected into the lower-dimensional space by

computing y j

j y j , j

= Φ

,...,

C . This lower-dimensional data is transmitted to

a central station, where it is ordered into the following structure:

⎡

⎤

⎡

⎤

⎡

⎤

y 1

y C

Φ 1

Φ C

e 1

e C

⎣

⎦ =

⎣

⎦

⎣

⎦

(4.11)

Sparse Representations and Compressive Sensing for Imaging and Vision

Search WWH ::

Custom Search

Home