Digital Signal Processing Reference
In-Depth Information
3.3
Multi-stream Syllable HMMs
3.3.1 Basic Structure of Syllable HMMs. Since CV syllable transition
and the change of characteristics such as “rising”, “falling” and “flat” are
highly related, the segmental and prosodic features are integrated using syllabic
unit HMMs. Our preliminarily experiments showed that the syllable unit HMMs
have approximately the same digit recognition accuracy for a connected digit
task as tied-state triphone HMMs.
The integrated syllable HMM denoted by “SP-HMM (Segmental-Prosodic
HMM)” models both phonetic context and transition. Table 9.1 is the list of
SP-HMMs used in our experiments. Each Japanese digit uttered continuously
with other digits can be modeled by a concatenation of two context-dependent
syllables. Even “2” (/ni/) and “5” (/go/) can be modeled by two syllables since
their final vowel is often lengthened as /ni:/ and /go:/. The context of each
syllable is considered only within each digit in our experiment. Therefore, each
SP-HMM is denoted by either a left-context dependent syllable “ LC-SYL , PM
or a right-context dependent syllable “ SYL+RC , PM ”, where “PM” indicates a
transition pattern which is either rising (“ U ”), falling(“ D ”) or flat(“ F ”). For
example, “the first syllable /i/ of “1” (/ichi/) which has rising transition”
is denoted as “ i+chi , U ”. Each SP-HMM has a standard left-to-right topology
with states, where is the number of phonemes in the syllable. “ sil
and “ sp ” models are used for representing a silence between digit strings and a
short pause between digits, respectively.
3.3.2 Multi-stream Modeling. SP-HMMs are modeled as multi-stream
HMMs. In the recognition stage, the probability
of generating segmental-
prosodic observation
at state
is calculated by:
where is the probability of generating segmental features and
is the probability of generating prosodic features and are weight-
ing factors for the segmental and prosodic streams, respectively. They are con-
strained by
3.3.3 Building SP-HMMs. Syllable HMMs for segmental and prosodic
features are separately made and combined to build SP-HMMs using a tied-
mixture technique as follows:
1 “S-HMMs (Segmental HMMs)” are trained by using only segmental fea-
tures. They are denoted by either
or
Here,
Search WWH ::




Custom Search