Background in Speech Recognition - Hierarchical Neural Network Structures for Phoneme Recognition

Digital Signal Processing Reference

In-Depth Information

a (1) = W (1) x

(2.16)

z (1) = f (1) ( a (1) )

(2.17)

where x =[1 ,x 1 ,x 2 ,...,x N i ] T and

⎡

⎣

⎤

⎦

w (1)

01 w (1)

11 ... w (1)

N i 1

w (1)

02 w (1)

12 ... w (1)

W (1) =

N i 2

(2.18)

...

... ... ...

w (1)

0 N h w (1)

N h 1 ... w (1)

N i N h

where N h is the number of hidden units.

Equivalent to the hidden layer, the output layer ( y ) can be calculated as:

a (2) = W (2) z (1)

(2.19)

and

y = f (2) ( a (2) )

(2.20)

y = f (2) ( W (2) f (1) ( W (1) x ))

(2.21)

given the nonlinear, parametric function:

y = f ( W , x )

(2.22)

Analogous to the linear schemes described in the previous section, accord-

ing to (2.21) the MLP can be seen as a feature space transformation, which

takes a multi-feature vector as input.

In this work, MLPs with only one hidden layer are mostly utilized as shown

in Fig. 2.9. As it will be explained, the activation functions are required to

be differentiable for estimating the parameters of the neural network. The

hidden layer described in 2.17 uses the sigmoid function given by:

1

1+exp( a (1)

j

f (1) ( a (1)

j

)) =

(2.23)

)

and the output layer utilizes the softmax function:

exp( a (2)

j

)

y j = f (2) ( a (2)

)) =

(2.24)

j

N o

exp( a (2)

k

)

k =1

1and

j

The softmax function has the properties that 0

≤

y j ≤

y j =1.

These properties are particular important in our work, as it will be shown in

the Hybrid HMM/ANN framework given in Section 2.3.4.

Hierarchical Neural Network Structures for Phoneme Recognition

Search WWH ::

Custom Search

Home