Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Fig. 7.7. Families of kernel functions that are used to control the neighborhood

on the map; x -axis: distance on the map (length of the shortest path between two

neurons). The curves show the kernels for different values of T ; from top to bot-

tom, T takes on values from 10 to 1 ( a ) K T ( δ ( c 1 ,c 2 )) = exp( − 0 . 5 δ ( c 1 ,c 2 ) /T )( b )

K T ( δ ( c 1 ,c 2 )) = exp( − 0 . 5 δ 2 ( c 1 ,c 2 ) /T 2 )

In that relation, χ is an allocation function, and W is the set of the p reference

vectors of the map. χ ( z i ) stands for the neuron of the map C that is associated

to the observation z i ,and δ ( c,χ ( z i )) is the distance on the map C between

aneuron c and the neuron that is allocated to observation z i . As for the

k -means algorithm, it is possible to view the links between the map and the

data space. Actually, the basic principles of those two algorithms are very

similar, as shown on Fig. 7.8. The difference stems from the fact that the set

of labels, shown on Fig. 7.1, is replaced by the label graph of the map. The

cost function J som is a mere extension of the k -means cost function I ( W,χ )=

z i ∈A

2 , where the Euclidean distance between an observation

z i and its associated reference vector is replaced by a generalized distance,

denoted d T , which takes into account all the neurons of the map

d T z i , w χ ( z i ) =

c∈C

z i −

w χ ( z i )

2 .

K T ( δ ( c,χ ( z i )))

z i −

w c

Note that the distance between z and w χ ( z ) , as expressed by the distance

function d T , is a weighted sum of the Euclidean distances between z and all

the reference vectors of the neighborhood of the neuron χ ( z ). Function J som

is equal to the function I ( W,χ ) if parameter T is small enough. In that case,

the distance d T is identical to the Euclidean distance.

The minimization of the cost function J som ( χ,W ) is performed in different

ways, depending on whether an adaptive or a batch optimization is desired.

In addition, a probabilistic formalism leads to a third version, which explicitly

estimates probability densities. Those three versions of the topological map

training algorithm are presented in the next sections.

Search WWH ::

Custom Search

Home