VARIANTS OF THE MCA EXIN NEURON - Neural-based Orthogonal Data Fitting: The EXIN Neural Networks

Information Technology Reference

In-Depth Information

for j

= N , N − 1, ... , N − M + 1. Then

w j (

)

= w j (

)

2 ,

t > 0

(3.32)

This analysis and the following considerations use the averaging ODE (3.30) and

so are limited to this approximation (i.e., as seen earlier, to the first part of the

weight vector time evolution).

The following theorem for the convergence of the learning law holds.

Theorem 86 (Convergence to the MS)

If the initial values of the weight vector

satisfy

1 , then the weight vectors

provided by eq. ( 3.29 ) converge, in the first part of their evolution, to the minor

subspace spanned by the corresponding minor components [i.e., eq. ( 3.27 ) holds].

(

)

z j

0 for j

N , N

−

...

, N

−

Proof. The proof is equal to the proof in [122] because MSA EXIN can be

deduced only by dividing the learning rate of MSA by w j ( t )

2 . In particular,

the same analysis as already done for LUO and MCA EXIN can be repeated,

and it is easy to notice that if the initial weight vector has modulus of less than

1, then MSA EXIN approaches the minor subspace more quickly than MSA

LUO.

Now, a qualitative analysis of the temporal evolution of MSA EXIN and

MSA LUO is given. The quantitative analysis will be the subject of future work.

First, consider eq. (3.22), which represents an adaptive Gram-Schmidt orthonor-

malization and is input to the j th neuron. It means that the M neurons of the

MSA LUO and EXIN neural networks are not independent, except the neuron

for j = N ( the driving neuron ), which converges to the first minor component.

It can be argued that the neurons labeled by j < N are driven by the weight

vectors of the neurons with labels i > j . As will be clear from the analysis lead-

ing to Proposition 57, the cost landscape of the MCA neurons has N critical

directions in the domain

− { 0 } . Among these, all the directions associated

with the nonminor components are saddle directions, which are minima in any

direction of the space spanned by bigger eigenvectors (in the sense of eigenvec-

tors associated with bigger eigenvalues) but are maxima in any direction of the

space spanned by smaller eigenvectors. The critical direction associated with the

minimum eigenvalue is the only global minimum. With these considerations in

mind, let's review the temporal evolution.

• Transient . The driving neuron weight vector tends to the minor component

direction, staying approximately on a hypersphere of radius equal to the

initial weight modulus. The other weight vectors stay in a first approxima-

tion on the respective hyperspheres [see eq. (3.32)] and tend to subspaces

orthogonal to the eigenvectors associated with the smaller eigenvalues (the

lower subspaces ) because the adaptive orthogonalization eliminates degrees

of freedom and the corresponding saddle in the cost landscape becomes a

Neural-based Orthogonal Data Fitting: The EXIN Neural Networks

Search WWH ::

Custom Search

Home