Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

The implementation of the k -means algorithm can be summarized as

follows:

k-Means Algorithm

1. Initialization phase: t = 0, choose the initial p reference vectors (randomly,

in general), choose the maximal number of iterations N iter .

2. Iteration: at iteration t , the reference set W t− 1 is known from the previous

step:

Allocation phase: update the allocation function χ t that is associated to

the reference set W t− 1 : a reference vector is allocated to each observa-

tion z , as given in (1).

Minimization phase: compute the new reference vectors W t , as given in

(2).

3. Iterate until the specified maximum number of iterations is reached, or

until I stabilizes.

Note that the k -means algorithm may be considered as belonging to the

family of dynamic clustering algorithms [Didday 1976]. It is a general method

that provides a local minimum of a cost function. That method is based on

using two entities: the set of partitions of the original data set into p subsets,

and the space W of the representation (which may be different from the data

set). Then, a subset P k of the partition will be represented by an element

w k , which will be its associated element of W . The discrepancy between an

element x of the data set and its associated element w k will be assessed by a

positive dissimilarity function d such that the smaller d ( x , w k ), the better x

agrees with w k . Thus, it is necessary to define a partition P =

{

P k /k =1 ...p

}

into p data subsets and jointly a set W =

of p representative

elements such that they minimize a cost function. The latter will be defined

from the training set by

{

w k /k =1 ...p

}

p

H ( P,W )=

d ( x , w k ) .

x i ∈P k A

k =1

The dynamic clustering algorithm minimizes that function iteratively way.

First, p representative elements are selected to initialize the process. Then,

the general iteration consists of two phases: first an allocation phase that

minimizes the cost function with respect to the partition, given the represen-

tative elements. Then, during the subsequent phase, the criterion is minimized

with respect to the p representative elements, retaining the previous allocation

function. In the particular case of k -means algorithm, the reference vectors are

the representative elements and the dissimilarity function d is the Euclidean

distance.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home