Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

The learning rate must decrease as the n umber of iterations t increases.

It may be piecewise constant, equal to 1 / √ t , or have any other ap propriate

form.

The three experiments that are shown on Fig. 7.2 allow understanding the

evolution of the k -means algorithm, classical and stochastic. They demon-

strate the sensitivity of the solution to the number of reference vectors and

to their initialization. For those experiments, the observations were gener-

ated randomly from spherical Gaussian laws with standard deviation σ =0 . 1.

Those laws are called the Gaussian modes. The first experiment is seeks a

two-class partition; it shows the evolution of the set of reference vectors that

capture observations from the four modes. During training, the two reference

vectors are attracted by the two blocks made of the observations of the left-

hand and right-hand sides. They stabilize at the centers of the two observation

blocks. The second experiment makes use of the same observation data set,

but seeks a four-class partition, with two different initializations of the refer-

ence vectors: at the center in the first experiment, and at the bottom right

in the second experiment. In the first case, the position, which is symmetric

with respect to the problem, allows finding the four classes produced by the

four Gaussian modes. With the second initialization, three reference vectors

are assigned to the data generated by the two right-hand Gaussian modes,

and the last one is assigned to the data generated by the other two modes.

7.2.3 Probabilistic Interpretation of k -Means

The k -means is minimizes the cost function I ( W,χ ), which is the sum of the

local inertias I c . We defined that cost function by following geometric and

kinetic intuition. It is possible to follow another approach. Actually, the cost

function has a natural probabilistic interpretation. In order to get insight into

that, a probabilistic model of data generation must be defined: we assume

that the observations of the training set are an i.i.d. sample of a mixture of p

Gaussian modes,

p

p ( z )=

α c f c ( z ) , with

α c =1 .

c =1

Each Gaussian mode has density f c

with expectation w c

and covariance

matrix equal to Σ c . Therefore, this density is given by

(2 π ) n/ 2 det ( Σ c ) 1 / 2 exp

w c ) .

1

2 ( z

w c ) T Σ − 1

f c ( z )=

−

( z

−

c

It is well known that a Gaussian mixture model is a general formalism,

which can be used for modeling complex probability distributions [Duda et al.

1973]. The mixture assumption states that each observation is a realization

of one of the hidden random variables with normal density f c . The mode is

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home