Digital Signal Processing Reference
In-Depth Information
Table 11.1 Performance of
baseline system and existing
methods on CU-Move corpus
(WER,%)
Baseline
70.02
SS + CMN
39.90
ETSI AFE
48.31
VTS
31.45
and five females) in real-life in-vehicle conditions, which were collected in
Minneapolis, Minnesota [ 15 ]. Data was down-sampled to 8 kHz and reflected a
9.50 dB SNR on average which was obtained using NIST Speech Quality Assur-
ance software [ 18 ].
The performance of the baseline system (no compensation) is examined with
comparison to several existing preprocessing algorithms in terms of environmental
robustness for speech recognition. Spectral Subtraction (SS) and Cepstral Mean
Normalization (CMN) were selected as conventional algorithms. These represent
the most commonly used techniques for additive noise suppression and removal of
channel distortion, respectively. In spectral subtraction [ 19 ], the subtraction factor
and flooring factor are set at 4.0 and 0.2, respectively, and background noise is
estimated using the minimum statistics method with a time delay of approximately
250 ms. For cepstral mean normalization, the average value of the cepstrum over the
current input utterance was subtracted from each frame. AFE (Advanced Front-End)
algorithm developed by ETSI was also evaluated as one of state-of-the-art methods
which contains an iterative Wiener filter and cepstral histogram equalization [ 20 ].
We also evaluated another feature compensation method, the VTS (Vector Taylor
Series) algorithm, for performance comparison where the noisy speech GMM is
adaptively estimated using the EM algorithm over each test utterance [ 7 ]. Table 11.1
demonstrates performance of the baseline system and existing algorithms.
Next, we discuss the determination of perturbation factor for the proposed
variational model composition by showing performance versus a change in the
perturbation factor. Performance was evaluated using the speech recognition ability
of the reconstructed speech by the PCGMM method which employs the variational
model composition method. To see the performance in various types of background
noise conditions, Aurora2 test database [ 16 ] was used. Here, we employed Subway,
Babble, Car, and Exhibition noise conditions which were included in “Set A” of
Aurroa2 database. Figure 11.4 presents the performance dependency on the pertur-
bation factor f p . The WER performance was plotted as a function of
from 0 to 0.1
for f p over four kinds of background noise conditions. Here, the WER is an average
value of all SNR conditions (i.e., 0, 5, 10, 15, and 20 dB) for each background noise,
and the plot with the solid circle presents the average performance of four kinds of
noise conditions. The performance of the case with
a
0 indicates the basic
PCGMM method employing only a basis model without the variational model
composition method, which is a target system for performance comparison of the
proposed VMC-PCGMM. It is interesting to note that each plot shows a concave
shape formulating a local minimum around 0.05-0.07 of
a ¼
a
values. These results
suggest that a suitable value for
needs to be determined to bring an effective
performance to the proposed variational noise model composition method.
a
Search WWH ::




Custom Search