Information Technology Reference
In-Depth Information
7.2.2 Stochastic Version of k -Means
The previous algorithm has all the shortcomings of deterministic optimization
algorithms. Generally, those algorithms depend strongly on initial conditions,
and converge to a local minimum. The optimization mechanism does not allow
exploring all the local minima of the cost function. As shown in Chap. 2, op-
timization can be improved simply by running several optimization processes
from various initial conditions, and selecting the best local minimum. In the
case of unsupervised learning, the best reference vector set and the best par-
tition will be selected, i.e., those which generate the smallest value of the cost
I ( W,χ ).
At each iteration, during the minimization phase, the reference set that
minimizes the cost function I ( W,χ ) for a given allocation function χ is de-
termined. Yet, it is not necessary to complete the move towards the global
minimum of the cost function to guarantee that it decreases. At time t ,given
the allocation function χ t , finding a reference vector set W t such that
I W t t
I W t− 1 t
is su cient.
One may implement a simple gradient descent algorithm, which guarantees
the decrease of I ( W,χ ) at each step. The computation of the gradient requires
the computation of the partial derivatives of I ( W,χ t ) with respect to all the
components of each reference vector w c ,
∂I
w c
=
x i ∈A
χ t ( z i ) = c
2( w c
z i ) .
The computation of the reference vectors that was performed by relation
w c =1 /n c z i ∈P c A z i at each step is replaced by
w t− 1
c z i .
∂I
w c
w c = w t− 1
= w t− 1
c −µ t
c 2 µ t
x i ∈A
χ t ( z i ) = c
That is the simple gradient descent optimization method that was de-
scribed in Chap. 2. The allocation function χ t that appears in the expression
of the gradient is defined in the allocation phase of iteration t , the quantity µ t
is the training rate at iteration t , and the reference vector w t− 1
c
was computed
during previous iteration. That algorithm is not adaptive, since it minimizes
the global cost function I ( W,χ ). To implement any change, the whole data
set A has to be used.
The adaptive or stochastic version of the k -means algorithm is obtained
by the following modification of the basic optimization procedure. The min-
imization is now performed stochastically: the terms of the sum in relation
I ( W,χ ) are considered separately. At each iteration, a single observation z i of
 
Search WWH ::




Custom Search