JOINT AUDIO-VIDEO PROCESSING FOR ROBUST BIOMETRIC SPEAKER IDENTIFICATION IN CAR - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

These likelihoods along with the likelihoods of the HMM representing the

world class results in a log likelihood ratio to be used in the multimodal

decision fusion.

Two possible audio-lip fusion schemes are carried out using

concatenative data fusion [12] and multi-stream HMMs [20]. The

concatenative data fusion is based on the early integration model [7] where

the integration is performed in the feature space to form a composite feature

vector of audio and lip features. Hence the joint audio-lip feature

is

formed by combining the audio feature

and the interpolated lip features

for the k- th audio-visual frame:

Figure 16-1. Multimodal speaker identification system.

4.

EXPERIMENTAL RESULTS

The block diagram for the multimodal audio-visual speaker identification

system is given in Figure 16-1. The database that has been used to test the

performance of the proposed speaker identification system includes 50

subjects. Training and testing are performed over two independent set of

recordings with each having five repetitions. A set of impostor data is also

collected with each subject in the population uttering five different names

Search WWH ::

Custom Search

Home