NOISE ROBUST SPEECH RECOGNITION USING PROSODIC INFORMATION - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

Chapter 9

NOISE ROBUST SPEECH RECOGNITION USING

PROSODIC INFORMATION

Koji Iwano, Takahiro Seki, Sadaoki Furui

Department of Computer Science, Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-ku,

Tokyo, 152-8552 Japan Email: iwano@furui.cs.titech.ac.jp

Abstract

This paper proposes a noise robust speech recognition method for Japanese utter-

ances using prosodic information. In Japanese, the fundamental frequency

contour conveys phrase intonation and word accent information. Consequently, it

also conveys information about prosodic phrase and word boundaries. This paper

first proposes a noise robust extraction method using the Hough transform,

which achieves high extraction accuracy under various noise environments. Then

it proposes a robust speech recognition method using syllable HMMs which model

both segmental spectral features and contours. We use two prosodic features

combined with ordinary cepstral parameters: a derivative of the time function of

and a maximum accumulated voting value of the Hough trans-

form representing a measure of continuity. Speaker-independent experiments

were conducted using connected digits uttered by 11 male speakers in various

kinds of noise and SNR conditions. It was confirmed that both prosodic features

improve the recognition accuracy in all noise conditions, and the effects are addi-

tive. When using both prosodic features, the best absolute improvement of digit

accuracy is about 4.5%. This improvement was achieved by improving the digit

boundary detection by using the robust prosodic information.

Keywords:

noise robust speech recognition, prosody, fundamental frequency

Hough

transform

Search WWH ::

Custom Search

Home