Information Technology Reference
In-Depth Information
on the whole database. An interpretation evaluation metric, taking into account
both aspects and working on a single interpretation result, is then needed.
This article presents our works concerning the development of vision-based sys-
tems for human detection and tracking in a known environment using a static cam-
era and the definition of an adaptable performance measure able to simultaneously
evaluate the localization, the recognition and the detection of interpreted objects
in a real scene using a manually made ground truth. If in a general way, the local-
ization and the recognition have to be as precise as possible, the relative impor-
tance of these two aspects can change depending of the foreseen application. We
describe in section 2 the successive algorithms implemented for the CAPTHOM
project which more particularly focused on indoor environments. The proposed
evaluation metric of a general image interpretation result is presented in section
3. Its potential interest is illustrated in section 4 on the CAPTHOM project.
Section 5 presents conclusions and perspectives of this study.
2 Visual-Based System Developments for Human
Detection in Image Sequences
Within the CAPTHOM project, we attempt to develop a human detection
system to limit power consumption of buildings and to monitor low mobility
persons. This project belongs to the numerous applications of human detection
systems for home automation, video surveillance, etc. The foreseen system must
be easily tunable and embeddable, providing an optimal compromise between
false detection rate and algorithmic complexity.
The development of a reliable human detection system in videos deals with
general object detection diculties (background complexity, illumination con-
ditions etc.) and with other specific constraints involved with human detection
(high variability in skin color, weight and clothes, presence of partial occlusions,
highly articulated body resulting in various appearances etc.). Despite of these
di culties, some very promising systems have already been proposed in the lit-
erature. It is especially the case of the method proposed by Viola and Jones [8]
which attempts to detect humans in still images using a well-suited representa-
tion of human shapes and a classification method. We first of all implemented
this method in a sliding window framework analyzing every image and using
several classifiers. This method is based on Haar-like filters and adaboost. In an
indoor environment, partial occlusions are actually frequent. The upper part of
the body (head and shoulders) is often the only visible part. As it is clearly insuf-
ficient to seek in the image only forms similar to the human body in its whole,
we implemented four classifiers: the whole body, the upper-body (front/back
view), the upper-body (left view) and the upper-body (right view). In a practi-
cal way, the classifier analyzes the image with a constant shift in the horizontal
and vertical direction. As the size of the person potentially present is not known
a priori and the classifier has a fixed size, the image is analyzed several times
by modifying the scale. The size of the image is divided by a scale factor ( sf )
between two scales. This method is called Viola [8] in the following paragraphs.
 
Search WWH ::




Custom Search