Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

N

z i −

w χ ( z i )

1

2 σ 2

2 + Nn ln σ +cte

=

i =1

= 1

2 σ 2 I ( W,χ )+ Nn ln σ +cte .

The minimization of V ( W,σ,χ ) may be performed in two steps:

•

In the first step, the cost function I ( W,χ ) that appears in the expression of

V ( W, σ ) is minimized. One recognizes the global inertia term discussed in

the previous section. The k -means algorithm is implemented (also in two

steps as described above). That step leads to a local minimum of I ( W,χ ),

denoted as I min .

•

In the second step the quantity

1

2 σ 2 I min + Nn ln σ

is minimized with respect to σ . That expression is minimum when its

derivative is equal to zero. Therefore, one has

σ = I min

Nn .

Thus, the k -means algorithm can be interpreted in a probabilistic framework.

The minimization of the cost function I ( W,χ ) amounts to the parametric es-

timation of a probabilistic mixture model under very restrictive assumptions.

The assumption of the isotropic identical distribution of the components of

the mixture, with the single covariance matrix σ 2 I , should be emphasized.

From a geometric point of view, that algorithm assumes that the data are

split into p equally weighted spherical clusters with the same radius. Such

is not always the case, so that the assumption is a severe limitation to the

application range of the original k -means algorithm.

The following simulation gives insight into the behavior of k -means when

the true data distribution does not comply with the assumptions of the prob-

abilistic model. The observations that are shown on Fig. 7.3 are significantly

non-isotropic and do not comply with the assumption of equal standard devia-

tions. Therefore, the implementation of k -means in that case favors a solution,

which is associated to a partition into two subsets that are as spherical as pos-

sible. Thus, the reconstructed partition is far from the original one (Fig. 7.3b).

In order to circumvent the problem, it may be e cient to display a larger

number of reference vectors: Fig. 7.4 shows the reference vectors and the asso-

ciated partition if five reference vectors are used. In that case, four reference

vectors are allocated to the left mode while the last reference vector represents

the other mode. Then, the problem of clustering the modes to reconstruct the

two original classes must be solved. Alternative data analysis methods, such

as hierarchical classification, can be taken advantage of. That methodology

will be demonstrated in the section “Classification and topological maps,”

where the introduction of expert knowledge is addressed.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home