Digital Signal Processing Reference
In-Depth Information
1.
INTRODUCTION
How to increase robustness is one of the most important issues in building
speech recognition systems in mobile and vehicular environments.
It has been found that human beings use prosodic information to increase
the robustness in recognizing speech when acoustic information is unreliable
[1]. Since the fundamental frequency contour is one of the most important
features for conveying Japanese prosody, it is expected to be useful for increas-
ing the robustness of automatic speech recognition. contour information has
already been used for improving the preformance of Japanese phoneme recog-
nition in clean condition [2]. However, with the present technology, it is not easy
to automatically extract correct values, especially in noisy environments.
Various techniques have been proposed to smooth out incorrect values from a
time series of extracted values, but these methods are not always successful.
This paper proposes a novel robust method, in which the Hough transform is
applied to a windowed time series of cepstral vectors extracted from speech, in-
stead of directly extracting independently for each frame of speech. Due to
its capability of extracting straight-line components from an image, the Hough
transform can extract a reliable value for each window. By shifting the win-
dow at every frame, a smooth time function of can be obtained.
We also propose a speech recognition method using prosodic features ex-
tracted by the Hough transform, consisting of a derivative of the time function
of or/and a measure of periodicity. These features are com-
bined with ordinary cepstral parameters and modeled by multi-stream HMMs,
which are trained using clean speech. Since contours represent phrase into-
nation and word accent in Japanese utterances, prosodic features are useful to
detect prosodic phrases and word boundaries. Therefore, the proposed method
using robust prosodic information is able to precisely detect word boundaries
and improve recognition performance under noisy environments.
The paper is organized as follows. In Section 2, a robust extraction
method using the Hough transform is proposed. Section 3 describes our mod-
eling scheme for noise robust speech recognition using syllable HMMs combin-
ing segmental and prosodic information. Experimental results are reported in
Section 4, and Section 5 concludes this paper.
Search WWH ::




Custom Search