JOINT AUDIO-VIDEO PROCESSING FOR ROBUST BIOMETRIC SPEAKER IDENTIFICATION IN CAR - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

Chapter 16

JOINT AUDIO-VIDEO PROCESSING FOR

ROBUST BIOMETRIC SPEAKER

IDENTIFICATION IN CAR 1

Engin Erzin, Yücel Yemez, A. Murat Tekalp

Multimedia, Vision and Graphics Laboratory, College of Engineering, Koç University,

Istanbul, 34450, TURKEY;

Email: eerzin@ku.edu.tr

Abstract:

In this chapter, we present our recent results on the multilevel Bayesian

decision fusion scheme for multimodal audio-visual speaker identification

problem. The objective is to improve the recognition performance over

conventional decision fusion schemes. The proposed system decomposes the

information existing in a video stream into three components: speech, lip trace

and face texture. Lip trace features are extracted based on 2D-DCT transform

of the successive active lip frames. The mel-frequency cepstral coefficients

(MFCC) of the corresponding speech signal are extracted in parallel to the lip

features. The resulting two parallel and synchronous feature vectors are used to

train and test a two stream Hidden Markov Model (HMM) based identification

system. Face texture images are treated separately in eigenface domain and

integrated to the system through decision-fusion. Reliability based ordering in

multilevel decision fusion is observed to be significantly robust at all SNR

levels.

Keywords:

Speaker identification, multi-modal, multilevel decision fusion, robustness, in-

vehicle

1 This work has been supported by The Scientific and Technical Research Council of Turkey

(TUBITAK) under the project EEEAG-101E038.

Search WWH ::

Custom Search

Home