JOINT AUDIO-VIDEO PROCESSING FOR ROBUST BIOMETRIC SPEAKER IDENTIFICATION IN CAR - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

scale [5]. The resulting MFCC features are derived using discrete cosine

transform over log-scaled filter-bank energies

where is the number of mel-scaled filter banks and N is the number of

MFCC features that are extracted. The MFCC feature vector for the k- th

frame is defined as,

The audio feature vector

for the k-

th frame is formed as a collection of MFCC vector

along with the first and

second delta MFCCs,

The gray scale intensity based lip stream is transformed into 2D-DCT

domain and then each lip frame is represented by the first M DCT

coefficients of the zig-zag scan excluding the 0-th dc coefficient. The lip

feature vector for the i -th lip frame is denoted by As the audio features

are extracted at a rate of 100 fps and the lip features are extracted at a rate of

15 fps, rate synchronization should be performed prior to the data fusion.

The lip features are computed using linear interpolation over the

sequence to match the 100 fps rate as follows:

where

and

The unimodal and fused temporal characterizations of the audio and the

lip modalities are performed using Hidden Markov Models, which are

reliable structures to model human hearing system, and thus they are widely

used for speech recognition and speaker identification problems [2]. In this

work a word-level continuous-density HMM structure is built for the speaker

identification task. Each speaker in the database population is modeled using

a separate HMM and is represented with the feature sequence that is

extracted over the audio/lip stream while uttering the secret phrase. First a

world HMM model is trained over the whole training data of the population.

Then each HMM associated to a speaker is trained over some repetitions of

the audio-video utterance of the corresponding speaker. In the identification

process, given a test feature set, each HMM structure produces likelihood.

Search WWH ::

Custom Search

Home