Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

7.2.2 Stochastic Version of k -Means

The previous algorithm has all the shortcomings of deterministic optimization

algorithms. Generally, those algorithms depend strongly on initial conditions,

and converge to a local minimum. The optimization mechanism does not allow

exploring all the local minima of the cost function. As shown in Chap. 2, op-

timization can be improved simply by running several optimization processes

from various initial conditions, and selecting the best local minimum. In the

case of unsupervised learning, the best reference vector set and the best par-

tition will be selected, i.e., those which generate the smallest value of the cost

I ( W,χ ).

At each iteration, during the minimization phase, the reference set that

minimizes the cost function I ( W,χ ) for a given allocation function χ is de-

termined. Yet, it is not necessary to complete the move towards the global

minimum of the cost function to guarantee that it decreases. At time t ,given

the allocation function χ t , finding a reference vector set W t such that

I W t ,χ t ≤

I W t− 1 ,χ t

is su cient.

One may implement a simple gradient descent algorithm, which guarantees

the decrease of I ( W,χ ) at each step. The computation of the gradient requires

the computation of the partial derivatives of I ( W,χ t ) with respect to all the

components of each reference vector w c ,

∂I

∂ w c

=

x i ∈A

χ t ( z i ) = c

2( w c −

z i ) .

The computation of the reference vectors that was performed by relation

w c =1 /n c z i ∈P c A z i at each step is replaced by

w t− 1

c − z i .

∂I

∂ w c

w c = w t− 1

= w t− 1

c −µ t

c − 2 µ t

x i ∈A

χ t ( z i ) = c

That is the simple gradient descent optimization method that was de-

scribed in Chap. 2. The allocation function χ t that appears in the expression

of the gradient is defined in the allocation phase of iteration t , the quantity µ t

is the training rate at iteration t , and the reference vector w t− 1

c

was computed

during previous iteration. That algorithm is not adaptive, since it minimizes

the global cost function I ( W,χ ). To implement any change, the whole data

set A has to be used.

The adaptive or stochastic version of the k -means algorithm is obtained

by the following modification of the basic optimization procedure. The min-

imization is now performed stochastically: the terms of the sum in relation

I ( W,χ ) are considered separately. At each iteration, a single observation z i of

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home