Digital Signal Processing Reference
In-Depth Information
Table 5.12.
Phoneme recognition of the German street name NOLDESTRASSE.
Phoneme Ref. /n/ /O/ /l/ /d/ /@/ /S/ /t/ /r/ /a:/ /s/ /@/ -
hierarchy intra - /l/ /OY/ /d/ /@/ /S/ /t/ /a:/ /a/ /z/ /@/ /k/
d s s c c c c s s s c i
hierarchy intra+inter /n/ - /OY/ /d/ /@/ /S/ /t/ /a:/ /a/ /z/ /@/ -
c
d
s
c
c
c
c
s
s
s
c
-
1
0.9
0.8
0.7
/n/
/O/
/l/
/d/
/OY/
0.6
0.5
0.4
0.3
0.2
0.1
20
25
30
35
40
45
50
55
60
Time (Frame Number)
(a) Intra-phonetic scheme.
1
0.9
0.8
/n/
/O/
/l/
/d/
/OY/
0.7
0.6
0.5
0.4
0.3
0.2
0.1
20
25
30
35
40
45
50
55
60
Time (Frame Number)
(b) Combination scheme (intra+inter).
Fig. 5.24. Posterior features at the input of the Viterbi decoder for the Ger-
man street name NOLDESTRASSE. Only the first part of the utterance is shown
(NOLD).
mentioned before, for the combination scheme, an MLP takes a window of
posterior features generated by the intra-phonetic scheme, for generating final
posteriors.
It can be seen clearly in Figure 5.24(b) why the combination scheme rec-
ognized the phoneme sequence /n OY d/. On the other hand, based on
phoneme insertion penalty and minimum duration constraints, the intra-
phonetic scheme recognized the sequence /l OY d/, substituting the correct
phoneme /n/ by /l/ (see Figure 5.24(a)). However, information of the correct
phonemes is still present along the entire utterance, e.g., the phoneme /O/
Search WWH ::




Custom Search