Feature Compensation Employing Variational Model Composition for Robust Speech Recognition in In-Vehicle Environment - Digital Signal Processing for In-Vehicle Systems and Safety

Digital Signal Processing Reference

In-Depth Information

addressing these noise conditions is that they might be changing depending on the

specific car and road conditions. In this study, we select 20 speakers from approxi-

mately 100 speakers in Minneapolis, Minnesota (i.e., Release 1.1A) and employ the

connected single digits portion that contains speech under a range of varying complex

in-vehicle noise events/conditions.

11.3 Variational Model Composition

In this section, a novel method is proposed to effectively estimate the time-varying

background noise contained in a speech utterance by using information contained in

non-speech segments. As initial knowledge for our discussion, the effect on log-

spectral coefficients caused by adding a gain to the cepstral coefficients is

presented. From fundamentals of the cepstrum, which is obtained by a discrete

cosine transformation (DCT) of the log-spectrum, each order of the obtained

cepstral coefficients represents a frequency of the log-spectrum envelope changes

(i.e., frequency [ 14 ]). For example, the lower-order cepstral coefficients indicate a

measure of the slowly changing components in the envelope of the log-spectrum,

having the 0th cepstral coefficient represent a DC component (i.e., energy) of the

log-spectrum at a frame. Therefore, applying a weight to each order of the cepstral

coefficients could generate a variation of the original cepstrum in terms of the

frequency of envelope change along the log-spectral axis.

Assume that a vector of cepstral coefficients x consists of 0th to (N

1)th

coefficients. A variation of the cepstrum vector can be obtained by adding a gain

vector g as follows:

x

¼

x

þ

g

(11.1)

¼

g, 0, 0, . . ., 0], the

log-spectral coefficients of the obtained variation will have a different energy level

from the original log-spectrum, which can be obtained by an inverse DCT of the

cepstral coefficients. Figure 11.1 , (a) shows log-spectra of the variations which are

generated by weighting the zeroth cepstral coefficient. The plain solid line indicates

the original log-spectral coefficients, and the lines with solid or empty circles

indicate the resulting log-spectrum by weighting + g and

If the gain is applied only on the 0th coefficient such as g

[

g at the zeroth cepstral

component, respectively. We can see the two variations have different energy levels

while maintaining an identical spectral envelope shape with the original

coefficients. Plots (b) and (c) present the log-spectra of the variations generated

by applying weights only to the first and fourth cepstral components, respectively.

The variations in (b) show a smooth change of the envelope, and plots of the

variations in (c) are varying relatively faster.

With this motivation, we believe that a range of models could be generated by

applying a combination of weights to an original model in the cepstral domain. In our

proposed method, it is assumed that (1) a basis noise model can be obtained from

periods of “silence” (e.g., non-speech) within the speech stream and (2) the target

Digital Signal Processing for In-Vehicle Systems and Safety

Search WWH ::

Custom Search

Home