Face Localization - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

Fig. 10.1. Face detection system proposed by Rowley et al. (image adapted from [198]).

Sung and Poggio [221] first proposed the use of a mixture of Gaussians to model

faces. They give distances to face cluster centroids as input to a multi-layer percep-

tron (MLP), trained for face/non-face classification.

Neural networks are a popular technique for pattern recognition problems in-

cluding face detection. The first advanced neural approach which reported results

on a large, difficult dataset was published by Rowley et al. [198]. Their system in-

corporates face knowledge in a retinotopically connected neural network, shown in

Fig. 10.1. The neural network is designed to analyze windows of 20 × 20 pixels.

There is one hidden layer with 26 units, where 4 units access 10 × 10 pixel subre-

gions, 16 look at 5 × 5 subregions, and 6 receive input from overlapping horizontal

stripes of size 20 × 5. The input window is preprocessed through lighting correction

and histogram equalization. Recently, Rowley et al. [199] combined this system

with a router neural network to detect faces situated at all angles in the image plane.

Apart from linear subspace methods and neural networks, there are several

other statistical approaches to image-based face detection, like systems based on

information theory or support-vector machines. For example, Schneiderman and

Kanade [206] use products of histograms of wavelet coefficients. They employ mul-

tiple views to detect 3D objects like cars and faces in different poses. Support vector

machines are used e.g. by Heisele et al. [92]. They describe a one-step detector for

entire faces and a component-based hierarchical detector.

Searching for feature combinations, matching features with translated, rotated,

and scaled face models, as well as scanning windows over all positions and scales

are time-consuming procedures that may limit the applicability of the above meth-

ods to real-time tasks. Furthermore, heuristics must be employed to prevent multiple

detections of the same face at nearby locations or scales.

In the following, a method is described that uses an instantiation of the Neu-

ral Abstraction Pyramid architecture, introduced in Chapter 4, to localize a face in

gray-scale still images. The network operates by iteratively refining an initial solu-

tion. Multiresolution versions of entire images are presented directly to the network,

and it is trained with supervision to localize the face as fast as possible. Thus, no

Search WWH ::

Custom Search

Home