Information Technology Reference
In-Depth Information
Remark.
Of course, the factor 1
/M
in front of the sum does not play any
role in the minimization of the cost function. It allows the definitions of the
average cost per example
, a quantity that allows easy comparisons between
results obtained with training sets of different sizes.
The partial cost
V
(
z
k
) must satisfy some conditions in order that the min-
imum of the cost function corresponds to appropriate weights. Weights
w
that
produce negative aligned fields must have a higher cost than weights produc-
ing positive aligned fields. Thus,
V
(
z
) must be a non-increasing function of
the aligned field
z
. However, that condition on
V
is not su
cient, at least in
the case of a linearly separable training set: if
w
∗
separates correctly
L
M
,then
any weight vector of the form
aw
∗
with
a>
1 is also a solution, with a
lower
cost. Hence, a minimization algorithm would never converge, since the cost
can decrease without bounds by increasing the norm of
w
without modifying
the hyperplane orientation. To avoid this, we impose the constraint that
w
be constant. Normalizations
w
=1and
w
=
N
+ 1 in the extended space
(or
w
=
N
in input space) are the most popular ones.
The simplest method of minimizing
C
(
w
)
is to use the algorithm of
gra-
dient descent,
as described in Chap. 2, which modifies iteratively the weights
following
w
(
t
+1)=
w
(
t
)+∆
w
(
t
)
,
with
µ
∂C
(
w
)
∂
w
∆
w
(
t
)=
−
(
t
)
∂V
z
k
∂z
k
M
µ
1
M
(
t
)
y
k
x
k
=
−
k
=1
=
M
c
k
(
t
)
y
k
x
k
,
k
=1
where
µ
is the learning rate, and we introduced the relation
∂z
k
/∂
w
=
y
k
x
k
.
It is convenient to normalize the weights after each iteration.The last relation
shows that the weights can be written under the general form:
M
c
k
y
k
x
k
.
w
=
k
=1
The parameters
c
k
, which are the sum of the
c
k
(
t
) over all the iterations,
depend on the algorithm. If
c
k
= 1 in the expresion of
w
, the mathemati-
cal expression of Hebb's rule is retrieved. That learning rule states that the
information used for modifying the synaptic e
cacies in the nervous system
is the correlation between the activity of the pre-synaptic neuron (neuron
excitation) and of the post-synaptic neuron (neuron firing rate). It is worth
Search WWH ::
Custom Search