Digital Signal Processing Reference
In-Depth Information
These likelihoods along with the likelihoods of the HMM representing the
world class results in a log likelihood ratio to be used in the multimodal
decision fusion.
Two possible audio-lip fusion schemes are carried out using
concatenative data fusion [12] and multi-stream HMMs [20]. The
concatenative data fusion is based on the early integration model [7] where
the integration is performed in the feature space to form a composite feature
vector of audio and lip features. Hence the joint audio-lip feature
is
formed by combining the audio feature
and the interpolated lip features
for the k- th audio-visual frame:
Figure 16-1. Multimodal speaker identification system.
4.
EXPERIMENTAL RESULTS
The block diagram for the multimodal audio-visual speaker identification
system is given in Figure 16-1. The database that has been used to test the
performance of the proposed speaker identification system includes 50
subjects. Training and testing are performed over two independent set of
recordings with each having five repetitions. A set of impostor data is also
collected with each subject in the population uttering five different names
Search WWH ::




Custom Search