Background in Speech Recognition - Hierarchical Neural Network Structures for Phoneme Recognition

Digital Signal Processing Reference

In-Depth Information

The new estimated parameters in the M-Step, involved in the Baum-Welch

algorithm are calculated as:

T

γ t ( j, m )

t =1

c jm =

(2.63)

T

M

γ t ( j, m )

t =1

m =1

T

γ t ( j, m ) x t

t =1

μ jm =

(2.64)

T

γ t ( j, m )

t =1

T

γ t ( j, m )( x t −

μ jm ) T

μ jm )( x t −

Σ jm =

t =1

(2.65)

T

γ t ( j, m )

t =1

Several issues are of high importance while using HMM such as the topology

of the model, the initial estimates and the number of parameters. For the case

of GMM, it has been mentioned that they can model any distribution with

a sucient number of normal mixtures. However, there is a compromise be-

tween the amount of training data and the number of components for obtaining

probability distributions robustly estimated. On the other hand, for reducing

number of parameters, the features can be decorrelated by means of features

transformation such as those described in Section 2.2.2. Thus, the use of diag-

onal covariance matrices is suitable in the transformed feature space.

2.3.4 Hybrid HMM/ANN

In the previous section, a HMM based on GMM has been presented. In fact,

the GMM is a generative model where the parameters are estimated so that

the likelihood of the training data given the model is maximized. In contrary

to generative models, discriminative models are also widely used in some

ASR application. In Section 2.2.3 a discriminative model based on multi-

layered perceptrons has been explained in detail. As it was mentioned, the

training procedure is based on minimizing an error function. In addition, the

parameters are estimated so that the model discriminates among the output

classes, mainly because a label or target vector at a particular time instance

corresponds to a one and zeros.

We have also seen that by using a softmax activation function as output

layer in a MLP, the output units have probability properties such as they are

between zero and one, and all of them sum to one. In fact, by considering each

Hierarchical Neural Network Structures for Phoneme Recognition

Search WWH ::

Custom Search

Home