Digital Signal Processing Reference
In-Depth Information
2.2
WTAll Decision Fusion
The conventional max rule given by Equation (7) can be modified so as
to better handle possible false identity claims. In this slightly modified
scheme that we will refer to as winner modality takes all (WTAll), the
likelihood ratios in (7) are substituted with confidence measures as defined in
(8):
In this manner, a strong decision for rejection can also be taken into account
and favored even though the corresponding likelihood ratio is not the
maximum of the likelihoods resulting from all available modalities.
3.
FEATURE EXTRACTION
In this section we consider a text-dependent multimodal speaker
identification. The bimodal database consists of audio and video signals
belonging to individuals of a certain population. Each person in this database
utters a predefined secret phrase that may vary from one person to another.
The objective is, given the data of an unknown person, to find whether this
person matches someone in the database or not. The person is identified if
there is a match and is rejected if not. The multimodal system uses three
feature sets extracted from each audio-visual stream that correspond to three
modalities: Face, lip trace and speech. Our goal is at least not to fail
whenever one of the individual classifiers gives the correct decision and also
to be robust against false identity claims. The overall classification is based
on the theoretical framework presented in Section 2.
3.1
Face Modality
The eigenface technique [4], or more generally the principal component
analysis, has proven itself as an effective and powerful tool for recognition of
still faces. The core idea is to reduce the dimensionality of the problem by
obtaining a smaller set of features than the original dataset of intensities. In
Search WWH ::




Custom Search