Information Technology Reference
In-Depth Information
The k -means algorithm is a traditional unsupervised classification algo-
rithm. It is the ancestor of self-organizing maps. In the next section, we de-
scribe both the most classical form of that algorithm, and its variants that
give insight into the connections with self-organized maps.
For all methods, we first describe the standard version of each algorithm.
Then we describe its most popular variants (stochastic or probabilistic ver-
sions).
7.2 The k -Means Algorithm
7.2.1 Outline of the k -Means Algorithm
The most known vector quantization method is the k -means algorithm. That
method finds the set of reference vectors W and the allocation function χ by
minimizing the cost function:
I ( W,χ )=
z i ∈A
2 =
c
z i
w χ ( z i )
2 .
z i
w c
z i ∈P c A
The quantity
I c =
z i ∈P c A
2
z i
w c
is the local inertia, with respect to the reference vector w c , of the observations
of the learning set A that are allocated to that reference vector. Therefore,
those observations belong to the subset P c . That inertia is the squared quanti-
zation error performed when the observations of the subset P c are replaced by
the reference vector w c that represents them. The total cost I ( W,χ ), which
is to be minimized, is the sum of the local inertias I c . In order to minimize
I ( W,χ ), one must define the allocation function χ . The quantity to minimize
becomes
I ( W,χ )=
c
I c =
c
2 .
z i
w c
z i ∈A
χ ( z i ) = c
The algorithm is implemented sequentially. An iteration is split into two
phases. The first sep consists in minimizing I ( W,χ ): assuming that the ref-
erence vectors are kept fixed, it computes the activation function χ . In the
second step, the value of the allocation function takes on the value that was
just computed: the cost function is then minimized with respect to the para-
meters W of the reference set. In that two-phase iterative process, the value
of the cost function I ( W,χ ) decreases at each step.
Search WWH ::




Custom Search