Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

the data set is presented. It leads to the update of the closest reference vector

w χ ( z i ) . It amounts to decrease only the single term

w t χ ( z i )

2 of I ( W,χ )

z i −

by gradient descent.

Then the partial gradient 2( w t χ ( z i ) −

z i ) is used to update the reference

vector w χ ( z i ) as follows:

2 µ t w t− 1

z i .

w t χ t ( z i ) = w t− 1

χ t ( z i ) −

A good minimum is obtained by presenting each observation of the data

set A repeatedly ( N iter must be large enough). When updating the reference

vectors, the gradient step µ decreases. When training starts, the value of µ is

relatively large, and the decrease of the cost function is not strictly guaranteed.

As training proceeds, µ t becomes small enough, each reference vector update is

small, and several updates must be performed to produce a significant change

in the cost function. In that case, there is no major difference between the

total gradient and the addition of several steps of the partial gradient. Then

the stochastic gradient algorithm behaves as the classical version of the k -

means algorithm does. The stochastic algorithm shows that k -means may be

considered as a competitive algorithm, where each observation of the data set

attracts the closest reference vector. Repeatedly presenting each observation

while the gradient step µ decreases allows finding a satisfactory partition P

such that each reference vector is the center of inertia of each subset of the

partition.

The following summary of stochastic k -means may be useful for algo-

rithm implementation:

Stochastic k-Means

1. Initialization: t =0,

Choose the initial p reference vectors (randomly, in general),

choose the maximal number of iterations N iter and the law of decrease of

the gradient step µ t .

2. Iteration t : keeping the reference set W t− 1 constant, as computed at the

previous iteration, choose randomly or sequentially an observation z i ,and

compute the gradient step (or learning rate) µ t .

Allocation phase :given W t− 1 , z i is assigned assign to the closest reference

element of W t− 1 , which defines a new allocation function χ t .

Minimization phase : the new reference vector w t χ ( z i ) is computed (2).

3. Iterate until the specified maximum number of iterations is reached, or

until I stabilizes.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home