Digital Signal Processing Reference
In-Depth Information
Figure 9-5.
Improvement of digit accuracy as a function of prosodic stream weight
in each
SNR condition.
are averaged at 20, 10, and 5dB SNR, respectively. In this experiment, the
prosodic feature P-DV was used, and insertion penalties were optimized. The
improvement using the SP-HMMs was observed over a wide range:
in all the noise conditions.
Best results were obtained when
was set
aroung 0.6, irrespective of the SNR level.
Figure 9-6 shows the optimum insertion penalty as a function of the prosodic
stream weight in the white noise condition, when the prosodic feature P-DV
was used. In noisy conditions, if the prosodic stream weight is low, we need to
set the insertion penalty high to compensate for the low reliability of segmen-
tal features. Since prosodic features are effective for digit boundary detection,
the higher the prosodic stream weight becomes, the lower the optimum inser-
tion penalty becomes. Similar results were obtained for other noise conditions.
The control range of the optimum insertion penalties in the best prosodic stream
weight condition is approximately a half of the range for the con-
dition without using the prosodic information. This means that the prosodic
features are effective for robust adjustment of the insertion penalty.
As a supplementary experiment, we compared the boundary detection ca-
pability of SP-HMMs and S-HMMs for digit recognition under noisy environ-
ments. Noise-added utterances and clean utterances were segmented by both
of these models using the forced-alignment technique. The boundary detection
Search WWH ::




Custom Search