Information Technology Reference
In-Depth Information
3.2 Feature Extraction and Object Recognition
There are more possibilities how to make a multi- camera surveillance system
[7,12,13]. Because of our goal - to acquaint as much information about objects
as possible, we use visual surveillance information retrieval instead of (multi-
)camera homography or handover regions as in [7]. Moreover, the area might be
large and objects will occlude in those regions.
Although there are many types of features to be extracted [14], primarily we
use descriptors based on the visual part of MPEG-7 [9]. We try to avoid color
descriptors only, as in [13], because most of airport passengers (at least on British
Isles) wear black coats and there is a lot of dark metallic cars there.
However, we have adopted color layout concept, where each object is resam-
pled into 8x8 pixels in Y'CbCr color model. Then, the descriptor coecients are
extracted zig- zag from its Discrete cosine transform similarly to JPEG. Other
(texture) descriptor is based on extraction of energy from (Fourier) frequency
domain bands defined by a bank of Gabor filters [9].
For the object classification we use also local features (such as SIFT and
SURF) and a simple region (blob) shape descriptor. The shape together with
previously described object metadata then acts as an input of a classification
algorithm in the recognition procedure of the CVM. The object recognition
process is based on 2 popular machine learning methods - AdaBoost and Support
vector machines (SVM), the OpenCV [8] implementation. The system has a
simple training GUI to mark an object by a simple click while holding a key to
associate a blob to its appropriate class or to change the class of a misclassified
sample.
To avoid this, CVM may use AdaBoost object detection based on Haar fea-
tures, similarly to the OpenCV face detection. Unfortunately, there are just a few
faces to be detected in the standard TV resolution video and camera setup sim-
ilar to the MCT dataset. The detector is followed by MPEG-7 Face recognition
descriptor [9]. Other face recognition approaches will be compared in the future
to allow a more precise and consistent object tracking and recognition in low-
resolution images and video. Thus, we concentrate more on retrieval methods at
the moment.
4 Surveillance Information Retrieval
Although there were published basics of wide area surveillance systems with non-
overlapping fields of view [10], these systems suffer from multiple deficiencies
caused by the curse of dimensionality - e.g. they allow only simple handover
regions [7] or they are unable to act in a crime investigation process [12,13],
because the real recordings are too massive and of low quality to be analyzed
eciently (as in CSI NY series).
The metadata coming from CVMs - local IDs, trajectories and object descrip-
tion must be cleaned, integrated, indexed and stored to be able of querying and
analyzing it, as illustrated in figure 2.
 
Search WWH ::




Custom Search