Selection of Visual Descriptors for the Purpose of Multi-camera Object Re-identification - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

current background. In case of significant discrepancy between the pixel and the

model, the pixel is regarded as the foreground. The model is updated continuously

to account for lighting condition changes.

The result of the object detection stage for a single video image is composed of

image regions representing moving objects. Contours of these objects may be para-

metrized using descriptors related to its size and shape, while the image section cov-

ered by each region provides data for calculation of appearance descriptors (colour,

texture, etc.) These descriptors are essential for the object re-identification [ 16 ].

12.3 Multi-camera Object Tracking

Object tracking performed within a single camera is a matter of finding relations

between objects detected in consecutive video frames, so that each moving object is

tracked on a frame-to-frame basis. The result of object tracking performed in one cam-

era is the track—a set of consecutive object positions. For a multi-camera approach,

these relations are searched among the images from different system cameras, so

that the object's tracks from a number of cameras are merged, which makes this task

more difficult. There are two general approaches to perform multi-camera object

tracking. Provided that the monitored area is observed by cameras with overlapping

views, the situation is quite simple. It is enough to define the relations between these

areas to re-identify objects in different images on the basis of their positions [ 38 ].

For disjointed camera views, the situation is more demanding. In order to recog-

nize objects between cameras, they need to be characterized by a set of features.

Again, two basic approaches can be found in the literature. In the first one, objects

are described by their biometrical features. For example, re-identification of persons

may be performed using features such as the gait or face [ 20 ]. However, application

of these methods is strictly limited regarding the image quality and camera place-

ment. A more popular approach for multi-camera object tracking involves extrac-

tion of image descriptors which depict object appearance [ 44 , 47 ]. Selection of a

proper description method needs to ensure that the selected features are character-

ized by a high robustness to expected variances in object appearance throughout all

cameras.

Surveillance systems can contain miscellaneous camera types (e.g. digital and

analogue ones), registering images under unstable lighting conditions. Hence, an

object representation in images from these devices can vary. Particularly, this

variation can be related to different camera characteristics or their settings (i.e.

white balance, exposure), and their position respective to the object (angle of view).

Additionally, scenes observed by the cameras can be illuminated differently, result-

ing in different object appearances. Such differences can also be noticed between

the indoor and outdoor surveillance cameras. Therefore, the acquired object descrip-

tion needs to be robust against such changing conditions. This can be achieved by

utilizing illumination-invariant features which allow obtaining feature vectors that,

in most of the cases, do not suffer from varying conditions. On the other hand, even

Search WWH ::

Custom Search

Home