Digital Signal Processing Reference
In-Depth Information
addressing these noise conditions is that they might be changing depending on the
specific car and road conditions. In this study, we select 20 speakers from approxi-
mately 100 speakers in Minneapolis, Minnesota (i.e., Release 1.1A) and employ the
connected single digits portion that contains speech under a range of varying complex
in-vehicle noise events/conditions.
11.3 Variational Model Composition
In this section, a novel method is proposed to effectively estimate the time-varying
background noise contained in a speech utterance by using information contained in
non-speech segments. As initial knowledge for our discussion, the effect on log-
spectral coefficients caused by adding a gain to the cepstral coefficients is
presented. From fundamentals of the cepstrum, which is obtained by a discrete
cosine transformation (DCT) of the log-spectrum, each order of the obtained
cepstral coefficients represents a frequency of the log-spectrum envelope changes
(i.e., frequency [ 14 ]). For example, the lower-order cepstral coefficients indicate a
measure of the slowly changing components in the envelope of the log-spectrum,
having the 0th cepstral coefficient represent a DC component (i.e., energy) of the
log-spectrum at a frame. Therefore, applying a weight to each order of the cepstral
coefficients could generate a variation of the original cepstrum in terms of the
frequency of envelope change along the log-spectral axis.
Assume that a vector of cepstral coefficients x consists of 0th to (N
1)th
coefficients. A variation of the cepstrum vector can be obtained by adding a gain
vector g as follows:
x
¼
x
þ
g
(11.1)
¼
g, 0, 0, . . ., 0], the
log-spectral coefficients of the obtained variation will have a different energy level
from the original log-spectrum, which can be obtained by an inverse DCT of the
cepstral coefficients. Figure 11.1 , (a) shows log-spectra of the variations which are
generated by weighting the zeroth cepstral coefficient. The plain solid line indicates
the original log-spectral coefficients, and the lines with solid or empty circles
indicate the resulting log-spectrum by weighting + g and
If the gain is applied only on the 0th coefficient such as g
[
g at the zeroth cepstral
component, respectively. We can see the two variations have different energy levels
while maintaining an identical spectral envelope shape with the original
coefficients. Plots (b) and (c) present the log-spectra of the variations generated
by applying weights only to the first and fourth cepstral components, respectively.
The variations in (b) show a smooth change of the envelope, and plots of the
variations in (c) are varying relatively faster.
With this motivation, we believe that a range of models could be generated by
applying a combination of weights to an original model in the cepstral domain. In our
proposed method, it is assumed that (1) a basis noise model can be obtained from
periods of “silence” (e.g., non-speech) within the speech stream and (2) the target
Search WWH ::




Custom Search