Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

The probability density is fully determined by the network architecture, which

provides an expression of the conditional probability p ( c 1 |

c 2 ) using the neigh-

borhood relation on the map and the conditional density of the observation

p ( z

c 1 )= f c 1 ( z ,W c 1 ,σ c 1 ). If we assume that the neighborhood relationships

permit the definition

K T ( δ ( c 1 ,c 2 )) , with T c 2 =

T c 2

K T ( δ ( c 2 ,r ));

p ( c 1 |

c 2 )=

then the posterior probability densities of the observations may be expressed

as a function of the Gaussian distributions of the neurons:

T c 2

K T ( δ ( c 2 ,r )) f r ( z , w r ,σ r ) .

p c 2 ( z )=

r∈C 1

Thus, p c 2 ( z ) can be interpreted as a local mixture of Gaussian densities that

are associated to each neuron of the map. The set of average vectors W =

{

are the

parameters to be estimated by training. The probabilistic formalism makes

it possible now to maximize the likelihood of the observation set just as for

the probabilistic version of k -means. If the observations of the training set A

are assumed to be the independent, and that each observation z i is generated

by the Gaussian mode p χ ( z i ) that is associated to neuron χ ( z i ), and if it is

further assumed that neurons c 2 of C 2 have similar prior probabilities, the

classifying likelihood can be written as

w c ; c

∈

}

and the set of scalar standard deviations σ =

{

σ c ; c

∈

}

W,σ,χ )= N

p ( z 1 , z 2 ,..., z N

p χ ( z i ) ( z i ) ,

i =1

which must be maximized with respect to the parameters of the model W , σ

and the allocation function χ . According to the usual strategy, it is performed

by a minimization process

r∈C

K T ( δ ( χ ( z i ) ,r )) f r ( z i , w r ,σ r )

E ( W,σ,χ )= −

i =1

by using the dynamic clustering formalism. The phases of allocation and min-

imization are sequentially and alternatively iterated until convergence:

•

Allocation phase . Assume that the parameters

have the values com-

puted at the previous iteration or at initialization. Then E must be mini-

mized with respect to the allocation function χ . A new allocation function

must be found that assigns each observation z to a neuron. That step

generates a new partition of the training data space D . It can easily be

seen that the optimal allocation function associates to a given observation

z i the most probable neuron c according to the density p c 2 :

χ ( z ) = arg max

c 2

{

W,σ

}

p c 2 ( z )

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home