Information Technology Reference
In-Depth Information
nonlinear neural networks, which are seriously plagued by the problem of local
minima of their cost function, the linear networks have simpler and meaningful
cost landscapes: The usual cost function of the linear feedforward network in
autoassociation mode has a unique minimum corresponding to the projection
onto the subspace generated by the first principal vectors, and all additional
critical points are saddle points [4,15].
Consider a linear unit with input x
] T and output
(
t
) =
[ x 1
(
t
)
, x 2
(
t
)
,
...
, x n
(
t
)
y
(
t
)
:
N
T
y
(
t
) =
1 w i (
t
)
x i (
t
) = w
(
t
)
x
(
t
)
(2.15)
i
=
] T
where
w (
t
) = [ w 1
(
t
)
, w 2
(
t
)
, ... , w n
(
t
)
is the weight vector. In MCA and
PCA analysis, x
is a bounded continuous-valued stationary ergodic (see
Section 2.3 for the definition) data vector with finite second-order moments.
The existing learning laws for the MCA of the autocorrelation matrix R
(
t
)
=
E x
) of the input vector x
x T
(
t
)
(
t
(
t
)
are listed below.
2.2.3.1 Oja's Learning Laws Changing Oja's learning law for PCA [138]
into a constrained anti-Hebbian rule, by reversing the sign, the following rule
( OJA ) [195] is given:
w ( t + 1 ) = w ( t ) α ( t ) y ( t ) x ( t ) y ( t ) w ( t )
(2.16)
α ( t ) being the positive learning rate. Its explicitly normalized version ( OJAn )
[195] (inspired by [110]) is
x ( t )
y ( t ) w ( t )
w
w ( t + 1 ) = w ( t ) α ( t ) y ( t )
(2.17)
T
(
t
) w (
t
)
Substituting eq. (2.15) into the equations above gives, respectively,
w ( t + 1 ) = w ( t ) α ( t ) x ( t ) x T
( t ) w ( t )
( t ) w ( t ) w ( t )
T
( t ) x ( t ) x T
w
(2.18)
) x
x T
w (
t
+
1
) = w (
t
) α (
t
(
t
)
(
t
) w (
t
)
T
( t ) x ( t ) x T
w
( t ) w ( t )
w ( t ) ]
(2.19)
w
T
( t ) w ( t )
Under certain assumptions, using the techniques of the stochastic approximation
theory (see [113,118,142]), the corresponding averaging differential equations
Search WWH ::




Custom Search