Digital Signal Processing Reference
In-Depth Information
where and is the decay coefficient of the Gibbs distribution
function, that can be used for likelihood normalization. The log likelihood
ratio is then defined as:
The log likelihood ratio as defined in Equation (11) requires the
definition of a universal background class For this, we will adapt the
faceness measure defined by the authors in [4]. The eigenspace origin will be
used as the representative feature vector of the face universal background
class. Hence is defined as the distance of the feature vector (that
yields the minimum distance ) to the universal background model. The log
likelihood ratio in (11) is computed for each class
and can be fused with
decisions coming from other available modalities.
3.2
Audio and Lip Modalities
The two synchronized modality streams, audio and lip, are used
separately and jointly to extract reliable identification performance under
varying environmental conditions. Audio features and lip features are
extracted separately from these synchronized streams at different rates.
Hence a rate adjustment is needed when these two modalities are jointly
fused to each other.
The audio stream is represented with the mel-frequency cepstral
coefficients (MFCC), as they yield good discrimination of speech signal. In
our system, the speech signal sampled at 16 kHz is analyzed on 25 ms frame
basis by frame shifts of 10 ms. Each frame is first multiplied with a
Hamming window and transformed to frequency domain using Fast Fourier
Transform (FFT). Mel-scaled triangular filter-bank energies are calculated
over the square magnitude of the spectrum and represented in logarithmic
Search WWH ::




Custom Search