Information Technology Reference
In-Depth Information
The
k
-means algorithm is a traditional unsupervised classification algo-
rithm. It is the ancestor of self-organizing maps. In the next section, we de-
scribe both the most classical form of that algorithm, and its variants that
give insight into the connections with self-organized maps.
For all methods, we first describe the standard version of each algorithm.
Then we describe its most popular variants (stochastic or probabilistic ver-
sions).
7.2 The
k
-Means Algorithm
7.2.1 Outline of the
k
-Means Algorithm
The most known vector quantization method is the
k
-means algorithm. That
method finds the set of reference vectors
W
and the allocation function
χ
by
minimizing the cost function:
I
(
W,χ
)=
z
i
∈A
2
=
c
z
i
−
w
χ
(
z
i
)
2
.
z
i
−
w
c
z
i
∈P
c
A
The quantity
I
c
=
z
i
∈P
c
A
2
z
i
−
w
c
is the local inertia, with respect to the reference vector
w
c
, of the observations
of the learning set
A
that are allocated to that reference vector. Therefore,
those observations belong to the subset
P
c
. That inertia is the squared quanti-
zation error performed when the observations of the subset
P
c
are replaced by
the reference vector
w
c
that represents them. The total cost
I
(
W,χ
), which
is to be minimized, is the sum of the local inertias
I
c
. In order to minimize
I
(
W,χ
), one must define the allocation function
χ
. The quantity to minimize
becomes
I
(
W,χ
)=
c
I
c
=
c
2
.
z
i
−
w
c
z
i
∈A
χ
(
z
i
)
=
c
The algorithm is implemented sequentially. An iteration is split into two
phases. The first sep consists in minimizing
I
(
W,χ
): assuming that the ref-
erence vectors are kept fixed, it computes the activation function
χ
. In the
second step, the value of the allocation function takes on the value that was
just computed: the cost function is then minimized with respect to the para-
meters
W
of the reference set. In that two-phase iterative process, the value
of the cost function
I
(
W,χ
) decreases at each step.
Search WWH ::
Custom Search