Information Technology Reference
In-Depth Information
Fig. 10.1. Face detection system proposed by Rowley et al. (image adapted from [198]).
Sung and Poggio [221] first proposed the use of a mixture of Gaussians to model
faces. They give distances to face cluster centroids as input to a multi-layer percep-
tron (MLP), trained for face/non-face classification.
Neural networks are a popular technique for pattern recognition problems in-
cluding face detection. The first advanced neural approach which reported results
on a large, difficult dataset was published by Rowley et al. [198]. Their system in-
corporates face knowledge in a retinotopically connected neural network, shown in
Fig. 10.1. The neural network is designed to analyze windows of 20 × 20 pixels.
There is one hidden layer with 26 units, where 4 units access 10 × 10 pixel subre-
gions, 16 look at 5 × 5 subregions, and 6 receive input from overlapping horizontal
stripes of size 20 × 5. The input window is preprocessed through lighting correction
and histogram equalization. Recently, Rowley et al. [199] combined this system
with a router neural network to detect faces situated at all angles in the image plane.
Apart from linear subspace methods and neural networks, there are several
other statistical approaches to image-based face detection, like systems based on
information theory or support-vector machines. For example, Schneiderman and
Kanade [206] use products of histograms of wavelet coefficients. They employ mul-
tiple views to detect 3D objects like cars and faces in different poses. Support vector
machines are used e.g. by Heisele et al. [92]. They describe a one-step detector for
entire faces and a component-based hierarchical detector.
Searching for feature combinations, matching features with translated, rotated,
and scaled face models, as well as scanning windows over all positions and scales
are time-consuming procedures that may limit the applicability of the above meth-
ods to real-time tasks. Furthermore, heuristics must be employed to prevent multiple
detections of the same face at nearby locations or scales.
In the following, a method is described that uses an instantiation of the Neu-
ral Abstraction Pyramid architecture, introduced in Chapter 4, to localize a face in
gray-scale still images. The network operates by iteratively refining an initial solu-
tion. Multiresolution versions of entire images are presented directly to the network,
and it is trained with supervision to localize the face as fast as possible. Thus, no
Search WWH ::




Custom Search