Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

that is defined by the eigenvectors and eigenvalues of the covariance matrix

Σ c . Note that the mixture model is general, since it can approximate any

probability distribution with arbitrary accuracy when the number of modes p

and the Gaussian mode parameters are selected appropriately. The geometric

characteristics of the data set repartition may be described in an analytic way

using the mixture model.

In that framework, the probabilistic k -means interpretation requires addi-

tional assumptions:

•

The prior density on the mode set is uniform, i.e. all the α c 's are equal to

1 /p .

•

The p normal densities f c has the same covariance matrix equal to σ 2 I ,

where I is the identity matrix and σ is the common standard deviation.

Therefore, those densities are given by

exp

.

2

1

(2 π ) n/ 2 σ n

−

z

−

w c

f c ( z )=

2 σ 2

•

The set A is an i.i.d. sample of a random variable that has the probability

density p ( z ).

Those assumptions restrict the validity domain of the interpretation. The

observations must be assumed to be partitioned into p clusters. Those clusters

are assumed to be isotropic, to have the same number of elements and to have

the same probability distribution.

Thus, the probabilistic version of k -means amounts to estimating the

reference vectors and the common standard deviation by maximizing the

likelihood of the data set A . That estimation is performed by maximizing

p ( z 1 , z 2 ,..., z N )where z 1 , z 2 ,..., z N

are the observations. Under the inde-

pendence assumption one has

p ( z 1 , , z 2 ,..., z N )= N

p ( z i ) .

i =1

As in the previous section, the allocation function χ is supposed to assign

to each observation z i its generating mode. The random generating modes

are the mixture components. Therefore the allocation function χ defines a

partition of the training set A into p subsets. If the classifying likelihood is

defined by

α χ ( z i ) f χ ( z i ) ( z i )= 1

N

χ )= N

N

p ( z 1 , z 2 ,... z N

|

f χ ( z i ) ( z i ) ,

p

i =1

then maximizing the classifying likelihood amounts to minimizing

V ( W,σ )=

−

ln p ( z 1 , z 2 ,..., z N )

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home