Information Technology Reference
In-Depth Information
The implementation of the
k
-means algorithm can be summarized as
follows:
k-Means Algorithm
1. Initialization phase:
t
= 0, choose the initial
p
reference vectors (randomly,
in general), choose the maximal number of iterations
N
iter
.
2. Iteration: at iteration
t
, the reference set
W
t−
1
is known from the previous
step:
Allocation phase:
update the allocation function
χ
t
that is associated to
the reference set
W
t−
1
: a reference vector is allocated to each observa-
tion
z
, as given in (1).
Minimization phase:
compute the new reference vectors
W
t
, as given in
(2).
3. Iterate until the specified maximum number of iterations is reached, or
until
I
stabilizes.
Note that the
k
-means algorithm may be considered as belonging to the
family of dynamic clustering algorithms [Didday 1976]. It is a general method
that provides a local minimum of a cost function. That method is based on
using two entities: the set of partitions of the original data set into
p
subsets,
and the space
W
of the representation (which may be different from the data
set). Then, a subset
P
k
of the partition will be represented by an element
w
k
, which will be its associated element of
W
. The discrepancy between an
element
x
of the data set and its associated element
w
k
will be assessed by a
positive dissimilarity function
d
such that the smaller
d
(
x
,
w
k
), the better
x
agrees with
w
k
. Thus, it is necessary to define a partition
P
=
{
P
k
/k
=1
...p
}
into
p
data subsets and jointly a set
W
=
of
p
representative
elements such that they minimize a cost function. The latter will be defined
from the training set by
{
w
k
/k
=1
...p
}
p
H
(
P,W
)=
d
(
x
,
w
k
)
.
x
i
∈P
k
A
k
=1
The dynamic clustering algorithm minimizes that function iteratively way.
First,
p
representative elements are selected to initialize the process. Then,
the general iteration consists of two phases: first an allocation phase that
minimizes the cost function with respect to the partition, given the represen-
tative elements. Then, during the subsequent phase, the criterion is minimized
with respect to the
p
representative elements, retaining the previous allocation
function. In the particular case of
k
-means algorithm, the reference vectors are
the representative elements and the dissimilarity function
d
is the Euclidean
distance.
Search WWH ::
Custom Search