Digital Signal Processing Reference
In-Depth Information
(wild card) means that HMMs are built without considering the transi-
tions, “ U ”, “ D ” or “ F ”. The total number of S-HMM states is the same as
the number of SP-HMM states. Twenty S-HMMs including “ sil ”, “ sp
are trained.
2
Training utterances are segmented into syllables by the forced-alignment
technique using the S-HMMs; and then, one of the transition labels,
U ”, “ D ” or “ F ”, is manually given to each segment according to its actual
pattern.
3
“P-HMMs (Prosodic HMMs)”, having a single state, are trained by prosodic
features within these segments, according to the
transition label. Eight
separate models,
sil ” and “ sp ”, are made. Each P-HMM has a single state, since it has
been found that syllabic contours in Japanese can be approximated by
a line function[4] and that the
value can be expected to be almost
constant in each CV syllable.
4
The S-HMMs and P-HMMs are combined to make SP-HMMs. Gaus-
sian mixtures for the segmental feature stream of SP-HMMs are tied with
corresponding S-HMM mixtures, while the mixtures for the prosodic fea-
ture stream are tied with corresponding P-HMM mixtures. Figure 9-3
Search WWH ::




Custom Search