Information Technology Reference
In-Depth Information
The learning rate must decrease as the n umber of iterations t increases.
It may be piecewise constant, equal to 1 / t , or have any other ap propriate
form.
The three experiments that are shown on Fig. 7.2 allow understanding the
evolution of the k -means algorithm, classical and stochastic. They demon-
strate the sensitivity of the solution to the number of reference vectors and
to their initialization. For those experiments, the observations were gener-
ated randomly from spherical Gaussian laws with standard deviation σ =0 . 1.
Those laws are called the Gaussian modes. The first experiment is seeks a
two-class partition; it shows the evolution of the set of reference vectors that
capture observations from the four modes. During training, the two reference
vectors are attracted by the two blocks made of the observations of the left-
hand and right-hand sides. They stabilize at the centers of the two observation
blocks. The second experiment makes use of the same observation data set,
but seeks a four-class partition, with two different initializations of the refer-
ence vectors: at the center in the first experiment, and at the bottom right
in the second experiment. In the first case, the position, which is symmetric
with respect to the problem, allows finding the four classes produced by the
four Gaussian modes. With the second initialization, three reference vectors
are assigned to the data generated by the two right-hand Gaussian modes,
and the last one is assigned to the data generated by the other two modes.
7.2.3 Probabilistic Interpretation of k -Means
The k -means is minimizes the cost function I ( W,χ ), which is the sum of the
local inertias I c . We defined that cost function by following geometric and
kinetic intuition. It is possible to follow another approach. Actually, the cost
function has a natural probabilistic interpretation. In order to get insight into
that, a probabilistic model of data generation must be defined: we assume
that the observations of the training set are an i.i.d. sample of a mixture of p
Gaussian modes,
p
p
p ( z )=
α c f c ( z ) , with
α c =1 .
c =1
c =1
Each Gaussian mode has density f c
with expectation w c
and covariance
matrix equal to Σ c . Therefore, this density is given by
(2 π ) n/ 2 det ( Σ c ) 1 / 2 exp
w c ) .
1
1
2 ( z
w c ) T Σ 1
f c ( z )=
( z
c
It is well known that a Gaussian mixture model is a general formalism,
which can be used for modeling complex probability distributions [Duda et al.
1973]. The mixture assumption states that each observation is a realization
of one of the hidden random variables with normal density f c . The mode is
 
Search WWH ::




Custom Search