Background in Speech Recognition - Hierarchical Neural Network Structures for Phoneme Recognition

Digital Signal Processing Reference

In-Depth Information

is no one-to-one mapping between the feature vectors and the sequence of

symbols. A second problem refers to the variability of the feature vectors

belonging to the same symbol. Given that the feature vectors are considered

as samples of a stochastic process, the statistical decoder has to be able to

characterize the common patterns of all feature vectors corresponding to a

particular symbol.

To deal with the first problem, the sequence of feature vectors is warped

to the sequence of symbols based on the dynamic programming principle.

On the other hand, for estimating the degree of correspondence of a feature

vector to a particular symbol, a parametric probability distribution is used

in the HMM framework. In the following, a deeper insight in the HMM is

given. Then, three common parametric probability distributions based on

Gaussian Mixture Modeling (GMM), Artificial Neural Networks and discrete

distributions are presented.

2.3.1 Hidden Markov Models

It has been mentioned previously that the task of the statistical decoder con-

sists of mapping a sequence of feature vectors to a sequence of symbols. In

the HMM framework, each symbol is represented as a HMM state. Fig. 2.10

shows a HMM with a three states left-to-right topology, also known as Bakis

topology [Bakis 76]. Each state emits feature vectors with certain probabilities.

The sequence of feature vectors is produced by an observable stochastic pro-

cess given that we can directly observe this sequence. This stochastic process

is associated with an embedded stochastic process which produces the state

sequence. The word hidden is placed in front of Markov models since the state

sequence is not directly observable or hidden. In the example given in Fig. 2.10,

an observable sequence of feature vectors X = { x 1 , x 2 , x 3 , x 4 , x 5 } has been

emitted by a hidden state sequence S =

{

s 1 ,s 2 ,s 3 ,s 4 ,s 5 }

{

1 , 1 , 2 , 3 , 3

}

.In

these sequences, the subindices indicate the time instant.

In general, a HMM can be characterized by:

•

Transition probabilities a ij = P ( s t = j

s t− 1 = i ) which are the probabil-

ity of going from state i to state j .

•

s t = j ) which are the probabilities of

emitting the feature vector x t when the state j is entered.

There are two assumptions in the HMM framework. The first refers to the

first order Markov chain.

State distributions b j ( x t )= p ( x t |

P ( s t |

s 1: t− 1 )= P ( s t |

s t− 1 )

(2.40)

where s 1: t− 1 notates the state sequence

. This assumption

indicates that the probability of staying in a particular state s t at time in-

stance t only depends on the state at the previous time s t− 1 .

The second assumption corresponds to the feature vector independency.

It indicates that the likelihood of a feature vector only depends on the state

{

s 1 ,s 2 ,...,s t− 1 }

Hierarchical Neural Network Structures for Phoneme Recognition

Search WWH ::

Custom Search

Home