Information Technology Reference
In-Depth Information
N
z i
w χ ( z i )
1
2 σ 2
2 + Nn ln σ +cte
=
i =1
= 1
2 σ 2 I ( W,χ )+ Nn ln σ +cte .
The minimization of V ( W,σ,χ ) may be performed in two steps:
In the first step, the cost function I ( W,χ ) that appears in the expression of
V ( W, σ ) is minimized. One recognizes the global inertia term discussed in
the previous section. The k -means algorithm is implemented (also in two
steps as described above). That step leads to a local minimum of I ( W,χ ),
denoted as I min .
In the second step the quantity
1
2 σ 2 I min + Nn ln σ
is minimized with respect to σ . That expression is minimum when its
derivative is equal to zero. Therefore, one has
σ = I min
Nn .
Thus, the k -means algorithm can be interpreted in a probabilistic framework.
The minimization of the cost function I ( W,χ ) amounts to the parametric es-
timation of a probabilistic mixture model under very restrictive assumptions.
The assumption of the isotropic identical distribution of the components of
the mixture, with the single covariance matrix σ 2 I , should be emphasized.
From a geometric point of view, that algorithm assumes that the data are
split into p equally weighted spherical clusters with the same radius. Such
is not always the case, so that the assumption is a severe limitation to the
application range of the original k -means algorithm.
The following simulation gives insight into the behavior of k -means when
the true data distribution does not comply with the assumptions of the prob-
abilistic model. The observations that are shown on Fig. 7.3 are significantly
non-isotropic and do not comply with the assumption of equal standard devia-
tions. Therefore, the implementation of k -means in that case favors a solution,
which is associated to a partition into two subsets that are as spherical as pos-
sible. Thus, the reconstructed partition is far from the original one (Fig. 7.3b).
In order to circumvent the problem, it may be e cient to display a larger
number of reference vectors: Fig. 7.4 shows the reference vectors and the asso-
ciated partition if five reference vectors are used. In that case, four reference
vectors are allocated to the left mode while the last reference vector represents
the other mode. Then, the problem of clustering the modes to reconstruct the
two original classes must be solved. Alternative data analysis methods, such
as hierarchical classification, can be taken advantage of. That methodology
will be demonstrated in the section “Classification and topological maps,”
where the introduction of expert knowledge is addressed.
 
Search WWH ::




Custom Search