INTRODUCTION TO THE TLS EXIN NEURON - Neural-based Orthogonal Data Fitting: The EXIN Neural Networks

Information Technology Reference

In-Depth Information

For both neurons, the search directions of the algorithm are taken conjugate with

respect to the Hessian matrix:

d i Hd j

= 0

∀ i = j

(4.12)

where the d i 's are the search directions (at time instants i ). Hence, E being the

cost function, the CG algorithm can be formulated as

) = w(

) + α(

)

(

)

(4.13)

(

) =−∇

(w(

))

(4.14)

d ( t + 1 ) =−∇ E (w( t + 1 )) + β( t ) d ( t )

(4.15)

β( t ) = ∇ E T

(w( t + 1 )) [ ∇ E (w( t + 1 )) −∇ E (w( t )) ]

d T

(4.16)

(

) ∇

(w(

)

where eq. (4.16) is called the Hestenes-Stiefel formula (other formulations are

possible). The learning rate parameter is defined as

d T

( t ) ∇ E (w( t ))

d T

α( t ) =−

(4.17)

(

)

(

)

and in the case of these two neurons, there is no need to avoid computation of the

Hessian matrix using a line minimization because of the a priori knowledge of this

matrix [see eq. (2.10)]. The CG algorithm has been derived on the assumption of

a quadratic error function with a positive definite Hessian matrix: In this case it

finds the minimum after at most n iterations, with n the dimension of the weight

vector. This clearly represents a significant improvement on the simple gradient

descent approach, which could take a very large number of steps to minimize

even a quadratic error function. In the case of MCA EXIN and TLS EXIN, the

error function is not quadratic and, for MCA EXIN, the corresponding Hessian

matrix near the minimum is not positive definite; this implies the possibility

of nondescent directions. To improve the method, the scaled conjugate gradient

(SCG) algorithm [133] has been implemented. It combines the CG approach with

the model trust region approach of the Levenberg - Marquardt algorithm. It adds

some multiple ( λ ) of the unit matrix to the Hessian matrix, giving the following

learning rate:

d T

( t ) ∇ E (w( t ))

δ( t )

( t ) ∇ E (w( t )

α( t ) =−

=−

(4.18)

d T

( t ) Hd ( t ) + λ( t ) d ( t )

For a positive definite Hessian matrix, δ( t ) > 0. If this is not the case, the value

of λ( t ) must be increased to make the denominator positive. Let the raised value

of λ( t ) be called λ( t ) . In [133] the following value is proposed:

λ( t ) = 2

δ( t )

d ( t )

λ( t ) −

(4.19)

Neural-based Orthogonal Data Fitting: The EXIN Neural Networks

Search WWH ::

Custom Search

Home