Information Technology Reference
In-Depth Information
The implementation of the k -means algorithm can be summarized as
follows:
k-Means Algorithm
1. Initialization phase: t = 0, choose the initial p reference vectors (randomly,
in general), choose the maximal number of iterations N iter .
2. Iteration: at iteration t , the reference set W t− 1 is known from the previous
step:
Allocation phase: update the allocation function χ t that is associated to
the reference set W t− 1 : a reference vector is allocated to each observa-
tion z , as given in (1).
Minimization phase: compute the new reference vectors W t , as given in
(2).
3. Iterate until the specified maximum number of iterations is reached, or
until I stabilizes.
Note that the k -means algorithm may be considered as belonging to the
family of dynamic clustering algorithms [Didday 1976]. It is a general method
that provides a local minimum of a cost function. That method is based on
using two entities: the set of partitions of the original data set into p subsets,
and the space W of the representation (which may be different from the data
set). Then, a subset P k of the partition will be represented by an element
w k , which will be its associated element of W . The discrepancy between an
element x of the data set and its associated element w k will be assessed by a
positive dissimilarity function d such that the smaller d ( x , w k ), the better x
agrees with w k . Thus, it is necessary to define a partition P =
{
P k /k =1 ...p
}
into p data subsets and jointly a set W =
of p representative
elements such that they minimize a cost function. The latter will be defined
from the training set by
{
w k /k =1 ...p
}
p
H ( P,W )=
d ( x , w k ) .
x i ∈P k A
k =1
The dynamic clustering algorithm minimizes that function iteratively way.
First, p representative elements are selected to initialize the process. Then,
the general iteration consists of two phases: first an allocation phase that
minimizes the cost function with respect to the partition, given the represen-
tative elements. Then, during the subsequent phase, the criterion is minimized
with respect to the p representative elements, retaining the previous allocation
function. In the particular case of k -means algorithm, the reference vectors are
the representative elements and the dissimilarity function d is the Euclidean
distance.
Search WWH ::




Custom Search