Information Technology Reference
In-Depth Information
first approximation of the neuron behavior [i.e., until the minimum is reached
for the first time, as will be clear later ( first approximation assumption)]. Recall
that the first approximation is exact if the weights are constrained in a subspace
of the weight vector space, just like hyperspheres or hyperplanes. Consider the
MCA EXIN linear neuron
T
y ( t ) = w
( t ) x ( t )
(2.47)
n are, respectively, the weight and the input vector. The
averaged cost function is [eq. (2.32)]
where w ( t ) , x ( t )
T R w
w
E = r (w , R ) = w
(2.48)
T
w
where R = E x
) is the autocorrelation matrix of the input vector x
x T
.
Assume that R is well behaved (i.e., full rank with distinct eigenvalues). Being an
autocorrelation matrix, it is symmetric, positive definite with orthogonal eigen-
vectors z n , z n 1 , ... , z 1 and corresponding eigenvalues λ n n 1 < ··· 1 .
The cost function E is bounded from below (see Section 2.1). As seen in eq.
(2.33), the gradient of the averaged cost function, unless a constant, is given by
(
t
)
(
t
(
t
)
1
E
=
[ R
w
r
(w
, R
) w
]
(2.49)
T
w
w
which is used for the MCA EXIN learning law.
2.5.1 Critical Directions
The following reasonings are an alternative demonstration of Proposition 43 and
introduce the notation. As a consequence of the assumptions above, the weight
vector space has a basis of orthogonal eigenvectors. Thus,
n
w (
t
) =
1 ω i
(
t
)
z i
(2.50)
i
=
From eqs. (2.49) and (2.50), it follows that the coordinates of the gradient along
the principal components are
j = 1 λ j ω
2
j
( t )
ω i ( t )
T z i
(
E
)
=
λ
j = 1 ω
(2.51)
i
w
T
( t ) w ( t )
2
j
(
t
)
Then the critical points of the cost landscape E are given by
j = 1 λ j ω
2
j
( t )
ω
(
t
) =
0 r
λ
j = 1 ω
=
0
(2.52)
i
i
2
j
(
t
)
Search WWH ::




Custom Search