Extending the Hierarchical Scheme: Inter and Intra Phonetic Information - Hierarchical Neural Network Structures for Phoneme Recognition - page 88

Digital Signal Processing Reference

In-Depth Information

MFCC

MLP

MLP

MLP

MERGER

(a) Setup A.3 non-overlapped slices. Total context: 27 frames.

MFCC

MLP

MLP

MLP

MLP

MLP

MERGER

(b) Setup B. 5 overlapped slices. Total context: 29 frames.

MFCC

MLP

MLP

MLP

MERGER

(c) Setup C. 3 overlapped slices. Total context: 19 frames.

Fig. 5.22. Intra phonetic scheme as a split temporal context technique. Each slice

consists of 9 concatenated MFCCs feature vectors. For the case of two different

overlapped slices, they have in common 4 overlapped MFCCs.

Tabl e 5.9. Phoneme Accuracy of each classifier given in Figure 5.22(b).

Classifier 1-state 3-state

MLP 1 − 2

55.80 60.04

MLP 1 − 1

65.60 69.05

MLP 1 0

67.37 70.64

MLP 1 1

65.00 68.31

MLP 1 2

54.32 58.91

Table 5.10 shows the results for the proposed approach when 1 and 3-state

models are employed. In the first row, results for a single MLP with a window

length of 9 frames are given, as shown in Figure 5.20.

The weights for WAVGlog merger were chosen in order to stress the central

information of the entire temporal context, i.e. a higher weight was given to

the classifier situated in the middle. All weights are selected to sum to one

Next Page

Hierarchical Neural Network Structures for Phoneme Recognition

Search WWH ::

Custom Search

Home