Information Technology Reference
In-Depth Information
nonlinear neural networks, which are seriously plagued by the problem of local
minima of their cost function, the linear networks have simpler and meaningful
cost landscapes: The usual cost function of the linear feedforward network in
autoassociation mode has a unique minimum corresponding to the projection
onto the subspace generated by the first principal vectors, and all additional
critical points are saddle points [4,15].
Consider a linear unit with input
x
]
T
and output
(
t
)
=
[
x
1
(
t
)
,
x
2
(
t
)
,
...
,
x
n
(
t
)
y
(
t
)
:
N
T
y
(
t
)
=
1
w
i
(
t
)
x
i
(
t
)
=
w
(
t
)
x
(
t
)
(2.15)
i
=
]
T
where
w (
t
)
=
[
w
1
(
t
)
,
w
2
(
t
)
,
...
,
w
n
(
t
)
is the weight vector. In MCA and
PCA analysis,
x
is a bounded continuous-valued stationary ergodic (see
Section 2.3 for the definition) data vector with finite second-order moments.
The existing learning laws for the MCA of the autocorrelation matrix
R
(
t
)
=
E
x
)
of the input vector
x
x
T
(
t
)
(
t
(
t
)
are listed below.
2.2.3.1 Oja's Learning Laws
Changing Oja's learning law for PCA [138]
into a constrained anti-Hebbian rule, by reversing the sign, the following rule
(
OJA
) [195] is given:
w (
t
+
1
)
=
w (
t
)
−
α (
t
)
y
(
t
)
x
(
t
)
−
y
(
t
) w (
t
)
(2.16)
α (
t
)
being the positive learning rate. Its explicitly normalized version (
OJAn
)
[195] (inspired by [110]) is
x
(
t
)
−
y
(
t
) w (
t
)
w
w (
t
+
1
)
=
w (
t
)
−
α (
t
)
y
(
t
)
(2.17)
T
(
t
) w (
t
)
Substituting eq. (2.15) into the equations above gives, respectively,
w (
t
+
1
)
=
w (
t
)
−
α (
t
)
x
(
t
)
x
T
(
t
) w (
t
)
(
t
)
w
(
t
)
w
(
t
)
T
(
t
)
x
(
t
)
x
T
−
w
(2.18)
)
x
x
T
w (
t
+
1
)
=
w (
t
)
−
α (
t
(
t
)
(
t
) w (
t
)
T
(
t
)
x
(
t
)
x
T
−
w
(
t
)
w
(
t
)
w (
t
)
]
(2.19)
w
T
(
t
) w (
t
)
Under certain assumptions, using the techniques of the stochastic approximation
theory (see [113,118,142]), the corresponding averaging differential equations
Search WWH ::
Custom Search