Digital Signal Processing Reference
In-Depth Information
(
,
)
function to measure the progress of (supervised) learning, the MSE E
x
W
between
ˆ
=
(
,
)
the gold standard y and the network output
is used—for simplification
we consider the case of a single output as in regression—an extension to multiple
outputs is straight forward:
y
f
x
W
2
E
(
x
,
W
) =|
y
−ˆ
y
|
(7.42)
Other target functions are frequently used, such as McClelland error or cross-
entropy. After an initialisation of weights, e.g., by random, three steps follow for the
back propagation:
1. Forward pass as 'normal' pass as in the recognition phase.
2. Computation of the MSE according to Eq. ( 7.42 ).
3. Backward pass with weight adaptation by the corrective term:
= w i β · δ
E
(
x
,
W
)
w i
w i + w i
,
(7.43)
δw i
where
β
is the step size, which is to be determined empirically, and
w i is an
individual weight within a neuron.
As a stopping criterion of the iterative updating of the weights one can either use
a maximum number of iterations or a minimal change of the error [ 20 ]. A 'good'
parameter set can only be determined empirically and based on experience. However,
approaches exist to learn these. To avoid over fitting, a sufficient number of training
instances is required as compared to the number of parameters in the network and
the dimensionality of the feature vector. An alternative is resilient propagation that
incorporates the last change of weights into the current change of weights [ 21 ]. By
learning the weights, ANNs are able to cope with redundant feature information.
The learning process is further discriminative as the information over all classes is
learnt at a time [ 17 ]. Their highly parallel processing is one of the main advantages
for efficient implementation. If the temporal context of a feature vector is relevant,
this context must be explicitly fed to the network, e.g., by using a fixed width sliding
window that combines several feature vectors to a 'super vector', as in [ 22 ].
7.2.3.3 Recurrent Neural Networks
Another technique for introducing past context to neural networks is to add backward
(cyclic) connections to FNNs. The resulting network is called a recurrent neural
network (RNN). RNNs can theoretically map from the entire history of previous
inputs to each output. The recurrent connections implicitly form a kind of memory,
which allows input values to persist in the hidden layer(s) and influence the network
output in the future. RNNs can be trained by back propagation through time (BPTT)
[ 23 ]. In BPTT, the network is first unfolded over time. The training then is similar as
if training a FNN with back propagation. However, each epoch must run through the
output observations in sequential order. Details are found in [ 23 ]. If in a RNN future
 
 
Search WWH ::




Custom Search