Digital Signal Processing Reference
In-Depth Information
Chapter 9
NOISE ROBUST SPEECH RECOGNITION USING
PROSODIC INFORMATION
Koji Iwano, Takahiro Seki, Sadaoki Furui
Department of Computer Science, Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-ku,
Tokyo, 152-8552 Japan Email: iwano@furui.cs.titech.ac.jp
Abstract
This paper proposes a noise robust speech recognition method for Japanese utter-
ances using prosodic information. In Japanese, the fundamental frequency
contour conveys phrase intonation and word accent information. Consequently, it
also conveys information about prosodic phrase and word boundaries. This paper
first proposes a noise robust extraction method using the Hough transform,
which achieves high extraction accuracy under various noise environments. Then
it proposes a robust speech recognition method using syllable HMMs which model
both segmental spectral features and contours. We use two prosodic features
combined with ordinary cepstral parameters: a derivative of the time function of
and a maximum accumulated voting value of the Hough trans-
form representing a measure of continuity. Speaker-independent experiments
were conducted using connected digits uttered by 11 male speakers in various
kinds of noise and SNR conditions. It was confirmed that both prosodic features
improve the recognition accuracy in all noise conditions, and the effects are addi-
tive. When using both prosodic features, the best absolute improvement of digit
accuracy is about 4.5%. This improvement was achieved by improving the digit
boundary detection by using the robust prosodic information.
Keywords:
noise robust speech recognition, prosody, fundamental frequency
Hough
transform
Search WWH ::




Custom Search