Audio Recognition - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

(

,

)

function to measure the progress of (supervised) learning, the MSE E

x

W

between

ˆ

=

(

,

)

the gold standard y and the network output

is used—for simplification

we consider the case of a single output as in regression—an extension to multiple

outputs is straight forward:

y

f

x

W

2

E

(

x

,

W

) =|

y

−ˆ

y

|

(7.42)

Other target functions are frequently used, such as McClelland error or cross-

entropy. After an initialisation of weights, e.g., by random, three steps follow for the

back propagation:

1. Forward pass as 'normal' pass as in the recognition phase.

2. Computation of the MSE according to Eq. ( 7.42 ).

3. Backward pass with weight adaptation by the corrective term:

= w i − β · δ

E

(

x

,

W

)

w i

→ w i + w i

,

(7.43)

δw i

where

β

is the step size, which is to be determined empirically, and

w i is an

individual weight within a neuron.

As a stopping criterion of the iterative updating of the weights one can either use

a maximum number of iterations or a minimal change of the error [ 20 ]. A 'good'

parameter set can only be determined empirically and based on experience. However,

approaches exist to learn these. To avoid over fitting, a sufficient number of training

instances is required as compared to the number of parameters in the network and

the dimensionality of the feature vector. An alternative is resilient propagation that

incorporates the last change of weights into the current change of weights [ 21 ]. By

learning the weights, ANNs are able to cope with redundant feature information.

The learning process is further discriminative as the information over all classes is

learnt at a time [ 17 ]. Their highly parallel processing is one of the main advantages

for efficient implementation. If the temporal context of a feature vector is relevant,

this context must be explicitly fed to the network, e.g., by using a fixed width sliding

window that combines several feature vectors to a 'super vector', as in [ 22 ].

7.2.3.3 Recurrent Neural Networks

Another technique for introducing past context to neural networks is to add backward

(cyclic) connections to FNNs. The resulting network is called a recurrent neural

network (RNN). RNNs can theoretically map from the entire history of previous

inputs to each output. The recurrent connections implicitly form a kind of memory,

which allows input values to persist in the hidden layer(s) and influence the network

output in the future. RNNs can be trained by back propagation through time (BPTT)

[ 23 ]. In BPTT, the network is first unfolded over time. The training then is similar as

if training a FNN with back propagation. However, each epoch must run through the

output observations in sequential order. Details are found in [ 23 ]. If in a RNN future

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home