Object Tracking over Multiple Uncalibrated Cameras Using Visual, Spatial and Temporal Similarities - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

are typically divided into two broad categories based on the field of view (FOV)

of each camera: common FOV methods [1][2] where cameras' FOVs largely over-

lap, and disjoint FOV methods [4][5][6] where a camera “hands-off” the tracking

of an object from the FOV of one camera to another camera. Traditional track-

ing methods such as Kalman filters are not appropriate when the topology of

the camera network is unknown and cameras are uncalibrated [4].

One of the classic problems in multi-camera tracking over either overlapping

or disjoint FOVs is the entry/exit problem, i.e., given that an object has left a

FOV at a particular location, which camera is most likely to see the object next,

where within that camera's FOV, and when? One solution to this problem was

presented by Javed et al. in [7]. Visual characteristics of objects were first used

to determine corresponding objects in different FOVs. Entry and exit points

in each camera's FOV were then determined using kernel density estimation.

Finally, optimal correspondences entry and exit points were determined using

a maximum a posteriori (MAP) approach based on a bipartite graph. Javed's

method works well with independent FOV scenarios without any inter-camera

calibration. However, it is restricted by the following:

1. a training phase must be available where correspondences between tracks

are known;

2. the entire set of observations must be available so hence, the method cannot

be deployed for real-time applications; and

3. the changes in visual characteristics of objects between camera views are

assumed to happen in the same, generally predictable way.

In this paper, we present a unified framework to solve the multi-camera tracking

problem in both independent FOV and common FOV cases. We assume that

objects have been independently tracked in each camera in a multi-camera net-

work, as in [7], and then aim to determine correspondences between these tracks

in a decentralised way, that is, without a centralised server. As in [7], our ap-

proach requires no camera calibration, or knowledge of the relative positions of

cameras and entry and exit locations of objects.

In contrast to [7], we remove each of the constraints listed earlier. We use a

kernel-based tracking algorithm, which creates kernels over the entire FOV of

each camera rather than only at entry and exit points. Our system effectively

performs unsupervised, online learning of a correspondence model by continuous

collection and updating of tracking statistics. This allows the proposed algorithm

to be performed in real-time with no need for a dedicated and supervised training

phase, thereby lifting constraints 1 and 2. To enable this collection of tracking

statistics we introduce the concept of reusing kernels, and show that by using

this technique the memory usage of the system is bounded. We then introduce

a location-based kernel matching method to address abrupt changes in visual

characteristics of objects (often due to changes in object pose or camera angle)

based on the historical data available through reusing kernels, thereby lifting con-

straint 3. This enables us to develop a lightweight, decentralised, multi-camera

tracking solution with limited communication between cameras ensuring that an

on-camera implementation is possible without requiring a coordinating server.

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home