Information Technology Reference
In-Depth Information
that is defined by the eigenvectors and eigenvalues of the covariance matrix
Σ c . Note that the mixture model is general, since it can approximate any
probability distribution with arbitrary accuracy when the number of modes p
and the Gaussian mode parameters are selected appropriately. The geometric
characteristics of the data set repartition may be described in an analytic way
using the mixture model.
In that framework, the probabilistic k -means interpretation requires addi-
tional assumptions:
The prior density on the mode set is uniform, i.e. all the α c 's are equal to
1 /p .
The p normal densities f c has the same covariance matrix equal to σ 2 I ,
where I is the identity matrix and σ is the common standard deviation.
Therefore, those densities are given by
exp
.
2
1
(2 π ) n/ 2 σ n
z
w c
f c ( z )=
2 σ 2
The set A is an i.i.d. sample of a random variable that has the probability
density p ( z ).
Those assumptions restrict the validity domain of the interpretation. The
observations must be assumed to be partitioned into p clusters. Those clusters
are assumed to be isotropic, to have the same number of elements and to have
the same probability distribution.
Thus, the probabilistic version of k -means amounts to estimating the
reference vectors and the common standard deviation by maximizing the
likelihood of the data set A . That estimation is performed by maximizing
p ( z 1 , z 2 ,..., z N )where z 1 , z 2 ,..., z N
are the observations. Under the inde-
pendence assumption one has
p ( z 1 , , z 2 ,..., z N )= N
p ( z i ) .
i =1
As in the previous section, the allocation function χ is supposed to assign
to each observation z i its generating mode. The random generating modes
are the mixture components. Therefore the allocation function χ defines a
partition of the training set A into p subsets. If the classifying likelihood is
defined by
α χ ( z i ) f χ ( z i ) ( z i )= 1
N
χ )= N
N
p ( z 1 , z 2 ,... z N
|
f χ ( z i ) ( z i ) ,
p
i =1
i =1
then maximizing the classifying likelihood amounts to minimizing
V ( W,σ )=
ln p ( z 1 , z 2 ,..., z N )
Search WWH ::




Custom Search