Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Remark. Of course, the factor 1 /M in front of the sum does not play any

role in the minimization of the cost function. It allows the definitions of the

average cost per example , a quantity that allows easy comparisons between

results obtained with training sets of different sizes.

The partial cost V ( z k ) must satisfy some conditions in order that the min-

imum of the cost function corresponds to appropriate weights. Weights w that

produce negative aligned fields must have a higher cost than weights produc-

ing positive aligned fields. Thus, V ( z ) must be a non-increasing function of

the aligned field z . However, that condition on V is not su cient, at least in

the case of a linearly separable training set: if w ∗ separates correctly L M ,then

any weight vector of the form aw ∗ with a> 1 is also a solution, with a lower

cost. Hence, a minimization algorithm would never converge, since the cost

can decrease without bounds by increasing the norm of w without modifying

the hyperplane orientation. To avoid this, we impose the constraint that w

be constant. Normalizations w =1and w = N + 1 in the extended space

(or w = N in input space) are the most popular ones.

The simplest method of minimizing C ( w ) is to use the algorithm of gra-

dient descent, as described in Chap. 2, which modifies iteratively the weights

following

w ( t +1)= w ( t )+∆ w ( t ) ,

with

µ ∂C ( w )

∂ w

∆ w ( t )=

−

( t )

∂V z k

∂z k

M

µ 1

M

( t ) y k x k

=

−

k =1

= M

c k ( t ) y k x k ,

k =1

where µ is the learning rate, and we introduced the relation ∂z k /∂ w = y k x k .

It is convenient to normalize the weights after each iteration.The last relation

shows that the weights can be written under the general form:

M

c k y k x k .

w =

k =1

The parameters c k , which are the sum of the c k ( t ) over all the iterations,

depend on the algorithm. If c k = 1 in the expresion of w , the mathemati-

cal expression of Hebb's rule is retrieved. That learning rule states that the

information used for modifying the synaptic e cacies in the nervous system

is the correlation between the activity of the pre-synaptic neuron (neuron

excitation) and of the post-synaptic neuron (neuron firing rate). It is worth

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home