Digital Signal Processing Reference
In-Depth Information
1.
INTRODUCTION
Biometric person recognition technologies include recognition of faces,
fingerprints, voice, signature strokes, iris and retina scans, and gait. Person
recognition in general encompasses two different, but closely related tasks:
Identification and verification. The former refers to identification of a person
from her/his biometric data from a set of candidates, while the latter refers to
verification of a person's biometric data. It is generally agreed that no single
biometric technology will meet the needs of all potential recognition
applications. Although the performances of these biometric technologies
have been studied individually, there is relatively little work reported in the
literature on the fusion of the results of various biometric technologies [1].
A particular problem in multi-modal biometric person identification,
which has a wide variety of applications, is the speaker identification
problem where basically two modalities exist: audio signal (voice) and video
signal. Speaker identification, when performed over audio streams, is
probably one of the most natural ways to perform person identification.
However, video stream is also an important source of biometric information,
in which we have still images of biometric features such as face and also the
temporal motion information such as lip, which is correlated with the audio
stream. Most speaker identification systems rely on audio-only data [2].
However especially under noisy conditions, such systems are far from being
perfect for high security applications. The same observation is also valid for
systems using only visual data; where poor picture quality or changes in
lighting conditions significantly degrade performance [3,4]. A better
alternative is the use of both modalities in a single identification scheme.
Person identification has a variety of applications at various levels of
security. A possible low security level application could be the identification
of a specific driver/passenger in car that provides various personal control
services to the driver or to the passenger. Speaker identification performance
usually degrades under adverse environmental conditions such as car noise,
and a multi-modal identification system helps to maintain a high level
reliability for the driver/passenger identification task. The visual data could
be available through a camera located on the visor.
The design of a multimodal identification system consists of two basic
problems. The first problem is to represent the raw data acquired for each
modality with a meaningful and robust set of features, which has to be
individually able to discriminate samples belonging to different classes under
Search WWH ::




Custom Search