Information Technology Reference
In-Depth Information
10. Face Localization
One of the major tasks in human-computer interface applications, such as face
recognition and video-telephony, is the exact localization of a face in an image.
In this chapter, I use the Neural Abstraction Pyramid architecture to solve this
problem, even in presence of complex backgrounds, difficult lighting, and noise.
The network is trained using a database of gray-scale still images to reproduce man-
ually determined eye coordinates. It is able to generate reliable and accurate eye
coordinates for unknown images by iteratively refining an initial solution.
The performance of the proposed approach is evaluated against a large test set.
It is also shown that a moving face can be tracked. The fast network update allows
for real-time operation.
10.1 Introduction to Face Localization
To make the interface between humans and computers more pleasant, computers
must adapt to the users. One prerequisite for adaptation is that the computer per-
ceives the user. An important step for many human-computer interface applications,
like face recognition, lip reading, reading of the users emotional state, and video-
telephony, is the localization of the user's face in a captured image. This is a task
humans can perform very well, without perceiving effort, while current computer
vision systems have difficulties.
An extensive body of literature exists for face detection and localization prob-
lems. Recently, Hjelmas and Low [99] published a survey on automatic face detec-
tion methods. They distinguish between feature-based and image-based methods.
Feature-based methods are further classified as either relying on low-level fea-
tures, such as edges, motion and skin color, as searching for higher-level features,
such as a pair of eyes, or as using active shape models, such as snakes or deformable
templates.
An example of a feature-based method that uses edges is the approach taken
by Govindaraju [83], where edges are extracted, labeled, and matched against a
predefined face model. A similar system is described by Jesorsky et al. [111]. It
consists of an edge extraction stage, a coarse localization that uses a face model, a
fine localization that relies on an eye model, as well as a multi-layer perceptron for
the exact localization of the pupils.
Search WWH ::




Custom Search