IN-CAR SPEECH RECOGNITION USING DISTRIBUTED MICROPHONES - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

2.3 Implementation

We have th following procedure to implement the technique. Log-

power-spectrum is calculated through Mel-filter-bank analysis followed

by log operation[8]. The spectrum of the speech captured by the close-

talking microphone,

is used as the speech at the source position S.

All log-power-spectrum

are normalized so that their means over an

utterance become zero, i.e.,

Note that in this implementation, the minimisation of regression error

is equivalent to minimising the MFCC distance between the approximated

and the target spectra, due to the orthogonality of the discrete time cosine

transform (DCT) matrix. Therefore, the MRLS has the same form as

the maximum likelihood optimization of the filter-and-sum beamformer

proposed in [5].

3. AUTOMATIC ADAPTATION OF MRLS

In the previous report[4], we found that changing regression weights

adaptively to the driving conditions is effective in improving the recog-

nition accuracy. In this section, we propose a method of discriminating

in-car noise conditions, which is mainly affected by driving conditions,

using spatial distribution of noise signals, and of controlling the regres-

sion weights for MRLS. The basic procedure of the proposed method is as

follows. 1) Cluster the noise signals, i.e., short-time non-speech segments

preceding utterances, into several groups. 2) For each noise group, train

optimal regression weights for MRLS, using the speech segments. 3) For

unknown input speech, find a corresponding noise group from background

noise, i.e., the non-speech segments, and perform MRLS with the optimal

weights for the noise cluster.

If there is a significant change in the sound source location, it greatly

affects the relative intensity among distributed microphones. Therefore,

in order to cluster the spatial noise distributions, we have developed a

feature vector based on the relative intensity of the signals captured at

the different positions to that of the nearest distant microphone, i.e.,

where

is the relative power at the

mel-filterbank

(MFB) channel calculated from the

microphone signal. We do not use

Search WWH ::

Custom Search

Home