Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

6.2.1 Real Time Recurrent Learning

6.2.1.1

Introduction

In this section the idea of the ZED risk, discussed in Chap. 5, is adapted

to learning in recurrent neural networks. It is shown how an online learning

algorithm for RNN, the Real Time Recurrent Learning (RTRL) [241] can

make use of the ZED principle. In Sect. 6.2.1.4 two applications are presented:

a symbolic prediction problem and a time series forecast problem (Mackey-

Glass chaotic time series). In both cases the new approach has advantages

when compared to the original RTRL.

6.2.1.2

RTRL

The RTRL algorithm for training fully recurrent neural networks was originally

proposed in [241]. Many modifications to this original proposal have been pre-

sented. This section follows with minor changes the notation of [95]. Consider

the fully recurrent neural network of Fig. 6.12. It contains q neurons, d inputs

and p outputs. The state-space description is given by the following equations

x( τ +1)= ϕ (w 1 ξ

( τ )) T

( τ )) ,...,ϕ (w q ξ

(6.11)

where x represents here the state vector, τ stands for the time, and ϕ is the

activation function. The ( q + d +1) vector w j contains the weights of neuron

j ;and

( τ ) is another ( q + d +1) vector defined by [x( τ ) , u( τ )] T .The( d +1)

input vector u( τ ) contains 1 in the first component (the bias fixed input

value); the remaining components are the d network inputs. The equation

that gives the p vector of network outputs, y,is

y( τ )=Cx( τ ) ,

(6.12)

where C is a p × q matrix that is used to select which neurons produce the

network output. The idea is to use the instantaneous gradient of the error

to guide the search for the optimal weights that minimize this error. The

algorithm works by computing the following for each time τ :

Λ j ( τ +1)=

( τ )(W a ( τ )

Λ j ( τ )+U j ( τ )) ,

(6.13)

e( τ )=t( τ )

−

Cx( τ ) ,

(6.14)

Δ w j = η e( τ ) T C

Λ j ( τ ) T ,

(6.15)

where

Λ j contains the partial derivatives of x w.r.t. the weight vector w j ,

is a diagonal matrix with the partial derivatives of the activation function

w.r.t. its arguments, W a contains part of the network weights and U j is a

zero matrix with the transpose of vector

in its j th row (see [95] for details).

Vector e is the error and t the desired target output.

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home