Digital Signal Processing Reference
In-Depth Information
2.3 Implementation
We have th following procedure to implement the technique. Log-
power-spectrum is calculated through Mel-filter-bank analysis followed
by log operation[8]. The spectrum of the speech captured by the close-
talking microphone,
is used as the speech at the source position S.
All log-power-spectrum
are normalized so that their means over an
utterance become zero, i.e.,
Note that in this implementation, the minimisation of regression error
is equivalent to minimising the MFCC distance between the approximated
and the target spectra, due to the orthogonality of the discrete time cosine
transform (DCT) matrix. Therefore, the MRLS has the same form as
the maximum likelihood optimization of the filter-and-sum beamformer
proposed in [5].
3. AUTOMATIC ADAPTATION OF MRLS
In the previous report[4], we found that changing regression weights
adaptively to the driving conditions is effective in improving the recog-
nition accuracy. In this section, we propose a method of discriminating
in-car noise conditions, which is mainly affected by driving conditions,
using spatial distribution of noise signals, and of controlling the regres-
sion weights for MRLS. The basic procedure of the proposed method is as
follows. 1) Cluster the noise signals, i.e., short-time non-speech segments
preceding utterances, into several groups. 2) For each noise group, train
optimal regression weights for MRLS, using the speech segments. 3) For
unknown input speech, find a corresponding noise group from background
noise, i.e., the non-speech segments, and perform MRLS with the optimal
weights for the noise cluster.
If there is a significant change in the sound source location, it greatly
affects the relative intensity among distributed microphones. Therefore,
in order to cluster the spatial noise distributions, we have developed a
feature vector based on the relative intensity of the signals captured at
the different positions to that of the nearest distant microphone, i.e.,
where
is the relative power at the
mel-filterbank
(MFB) channel calculated from the
microphone signal. We do not use
Search WWH ::




Custom Search