Information Technology Reference
In-Depth Information
the data set is presented. It leads to the update of the closest reference vector
w χ ( z i ) . It amounts to decrease only the single term
w t χ ( z i )
2 of I ( W,χ )
z i
by gradient descent.
Then the partial gradient 2( w t χ ( z i )
z i ) is used to update the reference
vector w χ ( z i ) as follows:
2 µ t w t− 1
z i .
w t χ t ( z i ) = w t− 1
χ t ( z i )
χ t ( z i )
A good minimum is obtained by presenting each observation of the data
set A repeatedly ( N iter must be large enough). When updating the reference
vectors, the gradient step µ decreases. When training starts, the value of µ is
relatively large, and the decrease of the cost function is not strictly guaranteed.
As training proceeds, µ t becomes small enough, each reference vector update is
small, and several updates must be performed to produce a significant change
in the cost function. In that case, there is no major difference between the
total gradient and the addition of several steps of the partial gradient. Then
the stochastic gradient algorithm behaves as the classical version of the k -
means algorithm does. The stochastic algorithm shows that k -means may be
considered as a competitive algorithm, where each observation of the data set
attracts the closest reference vector. Repeatedly presenting each observation
while the gradient step µ decreases allows finding a satisfactory partition P
such that each reference vector is the center of inertia of each subset of the
partition.
The following summary of stochastic k -means may be useful for algo-
rithm implementation:
Stochastic k-Means
1. Initialization: t =0,
Choose the initial p reference vectors (randomly, in general),
choose the maximal number of iterations N iter and the law of decrease of
the gradient step µ t .
2. Iteration t : keeping the reference set W t− 1 constant, as computed at the
previous iteration, choose randomly or sequentially an observation z i ,and
compute the gradient step (or learning rate) µ t .
Allocation phase :given W t− 1 , z i is assigned assign to the closest reference
element of W t− 1 , which defines a new allocation function χ t .
Minimization phase : the new reference vector w t χ ( z i ) is computed (2).
3. Iterate until the specified maximum number of iterations is reached, or
until I stabilizes.
Search WWH ::




Custom Search