Information Technology Reference
In-Depth Information
The probability density is fully determined by the network architecture, which
provides an expression of the conditional probability
p
(
c
1
|
c
2
) using the neigh-
borhood relation on the map and the conditional density of the observation
p
(
z
c
1
)=
f
c
1
(
z
,W
c
1
,σ
c
1
). If we assume that the neighborhood relationships
permit the definition
|
K
T
(
δ
(
c
1
,c
2
))
,
with
T
c
2
=
r
1
T
c
2
K
T
(
δ
(
c
2
,r
));
p
(
c
1
|
c
2
)=
then the posterior probability densities of the observations may be expressed
as a function of the Gaussian distributions of the neurons:
1
T
c
2
K
T
(
δ
(
c
2
,r
))
f
r
(
z
,
w
r
,σ
r
)
.
p
c
2
(
z
)=
r∈C
1
Thus,
p
c
2
(
z
) can be interpreted as a local mixture of Gaussian densities that
are associated to each neuron of the map. The set of average vectors
W
=
{
are the
parameters to be estimated by training. The probabilistic formalism makes
it possible now to maximize the likelihood of the observation set just as for
the probabilistic version of
k
-means. If the observations of the training set A
are assumed to be the independent, and that each observation
z
i
is generated
by the Gaussian mode
p
χ
(
z
i
)
that is associated to neuron
χ
(
z
i
), and if it is
further assumed that neurons
c
2
of
C
2
have similar prior probabilities, the
classifying likelihood can be written as
w
c
;
c
∈
C
}
and the set of scalar standard deviations
σ
=
{
σ
c
;
c
∈
C
}
W,σ,χ
)=
N
p
(
z
1
,
z
2
,...,
z
N
|
p
χ
(
z
i
)
(
z
i
)
,
i
=1
which must be maximized with respect to the parameters of the model
W
,
σ
and the allocation function
χ
. According to the usual strategy, it is performed
by a minimization process
N
ln
r∈C
K
T
(
δ
(
χ
(
z
i
)
,r
))
f
r
(
z
i
,
w
r
,σ
r
)
E
(
W,σ,χ
)=
−
i
=1
by using the dynamic clustering formalism. The phases of allocation and min-
imization are sequentially and alternatively iterated until convergence:
•
Allocation phase
. Assume that the parameters
have the values com-
puted at the previous iteration or at initialization. Then
E
must be mini-
mized with respect to the allocation function
χ
. A new allocation function
must be found that assigns each observation
z
to a neuron. That step
generates a new partition of the training data space
D
. It can easily be
seen that the optimal allocation function associates to a given observation
z
i
the most probable neuron
c
according to the density
p
c
2
:
χ
(
z
) = arg max
c
2
{
W,σ
}
p
c
2
(
z
)
Search WWH ::
Custom Search