Information Technology Reference
In-Depth Information
The probability density is fully determined by the network architecture, which
provides an expression of the conditional probability p ( c 1 |
c 2 ) using the neigh-
borhood relation on the map and the conditional density of the observation
p ( z
c 1 )= f c 1 ( z ,W c 1 c 1 ). If we assume that the neighborhood relationships
permit the definition
|
K T ( δ ( c 1 ,c 2 )) , with T c 2 =
r
1
T c 2
K T ( δ ( c 2 ,r ));
p ( c 1 |
c 2 )=
then the posterior probability densities of the observations may be expressed
as a function of the Gaussian distributions of the neurons:
1
T c 2
K T ( δ ( c 2 ,r )) f r ( z , w r r ) .
p c 2 ( z )=
r∈C 1
Thus, p c 2 ( z ) can be interpreted as a local mixture of Gaussian densities that
are associated to each neuron of the map. The set of average vectors W =
{
are the
parameters to be estimated by training. The probabilistic formalism makes
it possible now to maximize the likelihood of the observation set just as for
the probabilistic version of k -means. If the observations of the training set A
are assumed to be the independent, and that each observation z i is generated
by the Gaussian mode p χ ( z i ) that is associated to neuron χ ( z i ), and if it is
further assumed that neurons c 2 of C 2 have similar prior probabilities, the
classifying likelihood can be written as
w c ; c
C
}
and the set of scalar standard deviations σ =
{
σ c ; c
C
}
W,σ,χ )= N
p ( z 1 , z 2 ,..., z N
|
p χ ( z i ) ( z i ) ,
i =1
which must be maximized with respect to the parameters of the model W , σ
and the allocation function χ . According to the usual strategy, it is performed
by a minimization process
N
ln
r∈C
K T ( δ ( χ ( z i ) ,r )) f r ( z i , w r r )
E ( W,σ,χ )=
i =1
by using the dynamic clustering formalism. The phases of allocation and min-
imization are sequentially and alternatively iterated until convergence:
Allocation phase . Assume that the parameters
have the values com-
puted at the previous iteration or at initialization. Then E must be mini-
mized with respect to the allocation function χ . A new allocation function
must be found that assigns each observation z to a neuron. That step
generates a new partition of the training data space D . It can easily be
seen that the optimal allocation function associates to a given observation
z i the most probable neuron c according to the density p c 2 :
χ ( z ) = arg max
c 2
{
W,σ
}
p c 2 ( z )
 
Search WWH ::




Custom Search