Information Technology Reference
In-Depth Information
for j
= N , N 1, ... , N M + 1. Then
w j (
)
= w j (
)
2
2
2
2 ,
t
0
t > 0
(3.32)
This analysis and the following considerations use the averaging ODE (3.30) and
so are limited to this approximation (i.e., as seen earlier, to the first part of the
weight vector time evolution).
The following theorem for the convergence of the learning law holds.
Theorem 86 (Convergence to the MS)
If the initial values of the weight vector
T
j
satisfy
1 , then the weight vectors
provided by eq. ( 3.29 ) converge, in the first part of their evolution, to the minor
subspace spanned by the corresponding minor components [i.e., eq. ( 3.27 ) holds].
w
(
0
)
z j
=
0 for j
=
N , N
1,
...
, N
M
+
Proof. The proof is equal to the proof in [122] because MSA EXIN can be
deduced only by dividing the learning rate of MSA by w j ( t )
4
2 . In particular,
the same analysis as already done for LUO and MCA EXIN can be repeated,
and it is easy to notice that if the initial weight vector has modulus of less than
1, then MSA EXIN approaches the minor subspace more quickly than MSA
LUO.
Now, a qualitative analysis of the temporal evolution of MSA EXIN and
MSA LUO is given. The quantitative analysis will be the subject of future work.
First, consider eq. (3.22), which represents an adaptive Gram-Schmidt orthonor-
malization and is input to the j th neuron. It means that the M neurons of the
MSA LUO and EXIN neural networks are not independent, except the neuron
for j = N ( the driving neuron ), which converges to the first minor component.
It can be argued that the neurons labeled by j < N are driven by the weight
vectors of the neurons with labels i > j . As will be clear from the analysis lead-
ing to Proposition 57, the cost landscape of the MCA neurons has N critical
directions in the domain
N
− { 0 } . Among these, all the directions associated
with the nonminor components are saddle directions, which are minima in any
direction of the space spanned by bigger eigenvectors (in the sense of eigenvec-
tors associated with bigger eigenvalues) but are maxima in any direction of the
space spanned by smaller eigenvectors. The critical direction associated with the
minimum eigenvalue is the only global minimum. With these considerations in
mind, let's review the temporal evolution.
Transient . The driving neuron weight vector tends to the minor component
direction, staying approximately on a hypersphere of radius equal to the
initial weight modulus. The other weight vectors stay in a first approxima-
tion on the respective hyperspheres [see eq. (3.32)] and tend to subspaces
orthogonal to the eigenvectors associated with the smaller eigenvalues (the
lower subspaces ) because the adaptive orthogonalization eliminates degrees
of freedom and the corresponding saddle in the cost landscape becomes a
Search WWH ::




Custom Search