Information Technology Reference
In-Depth Information
6.2.1 Real Time Recurrent Learning
6.2.1.1
Introduction
In this section the idea of the ZED risk, discussed in Chap. 5, is adapted
to learning in recurrent neural networks. It is shown how an online learning
algorithm for RNN, the Real Time Recurrent Learning (RTRL) [241] can
make use of the ZED principle. In Sect. 6.2.1.4 two applications are presented:
a symbolic prediction problem and a time series forecast problem (Mackey-
Glass chaotic time series). In both cases the new approach has advantages
when compared to the original RTRL.
6.2.1.2
RTRL
The RTRL algorithm for training fully recurrent neural networks was originally
proposed in [241]. Many modifications to this original proposal have been pre-
sented. This section follows with minor changes the notation of [95]. Consider
the fully recurrent neural network of Fig. 6.12. It contains q neurons, d inputs
and p outputs. The state-space description is given by the following equations
x( τ +1)= ϕ (w 1 ξ
( τ )) T
( τ )) ,...,ϕ (w q ξ
,
(6.11)
where x represents here the state vector, τ stands for the time, and ϕ is the
activation function. The ( q + d +1) vector w j contains the weights of neuron
j ;and
( τ ) is another ( q + d +1) vector defined by [x( τ ) , u( τ )] T .The( d +1)
input vector u( τ ) contains 1 in the first component (the bias fixed input
value); the remaining components are the d network inputs. The equation
that gives the p vector of network outputs, y,is
ξ
y( τ )=Cx( τ ) ,
(6.12)
where C is a p × q matrix that is used to select which neurons produce the
network output. The idea is to use the instantaneous gradient of the error
to guide the search for the optimal weights that minimize this error. The
algorithm works by computing the following for each time τ :
Λ j ( τ +1)=
D
( τ )(W a ( τ )
Λ j ( τ )+U j ( τ )) ,
(6.13)
e( τ )=t( τ )
Cx( τ ) ,
(6.14)
Δ w j = η e( τ ) T C
Λ j ( τ ) T ,
(6.15)
where
Λ j contains the partial derivatives of x w.r.t. the weight vector w j ,
D
is a diagonal matrix with the partial derivatives of the activation function
w.r.t. its arguments, W a contains part of the network weights and U j is a
zero matrix with the transpose of vector
in its j th row (see [95] for details).
Vector e is the error and t the desired target output.
ξ
 
Search WWH ::




Custom Search