Digital Signal Processing Reference
In-Depth Information
MFCC
MLP
MLP
MLP
MERGER
(a) Setup A.3 non-overlapped slices. Total context: 27 frames.
MFCC
MLP
MLP
MLP
MLP
MLP
MERGER
(b) Setup B. 5 overlapped slices. Total context: 29 frames.
MFCC
MLP
MLP
MLP
MERGER
(c) Setup C. 3 overlapped slices. Total context: 19 frames.
Fig. 5.22. Intra phonetic scheme as a split temporal context technique. Each slice
consists of 9 concatenated MFCCs feature vectors. For the case of two different
overlapped slices, they have in common 4 overlapped MFCCs.
Tabl e 5.9. Phoneme Accuracy of each classifier given in Figure 5.22(b).
Classifier 1-state 3-state
MLP 1 2
55.80 60.04
MLP 1 1
65.60 69.05
MLP 1 0
67.37 70.64
MLP 1 1
65.00 68.31
MLP 1 2
54.32 58.91
Table 5.10 shows the results for the proposed approach when 1 and 3-state
models are employed. In the first row, results for a single MLP with a window
length of 9 frames are given, as shown in Figure 5.20.
The weights for WAVGlog merger were chosen in order to stress the central
information of the entire temporal context, i.e. a higher weight was given to
the classifier situated in the middle. All weights are selected to sum to one
Search WWH ::




Custom Search