Information Technology Reference
In-Depth Information
The centroids µ j are initialized arbitrarily, e.g. to randomly chosen examples. Each
iteration consists of two steps:
Step1: Assign the data vectors to the closest centroid: c i =
k x i µ j k .
Step2: Move the centroids to the mean of the assigned examples: µ j = h x i i
argmin
1 ≤ j ≤ K
c i = j .
In a probabilistic framework, the K -means algorithm is a special case of the
expectation-maximization (EM) algorithm [52] applied to mixture density estima-
tion. Since each step decreases the quantization error until the assignment does not
change any more, the K -means algorithm finds a local minimum of the error func-
tion 5.1. The quality of the approximation depends on the number of centroids and
on the initialization. One can start with a low number of clusters that are split re-
cursively to avoid initialization problems and to determine the number of clusters
needed.
Competitive Learning. The algorithm described above can be viewed as compe-
tition between the centroids to respond to the data vectors. Similar competition is
achieved in winner-takes-all (WTA) networks. These neural networks consist of an
input layer and a layer of laterally connected processing elements which assess the
similarity between the current data vector x i and their weight vector w j . The assess-
ment can be done using a distance function d ( x i , w j ) , as in self organizing maps
(SOM) [126], or by computing the scalar product x i · w j . If the weight vectors and
the inputs are normalized, the scalar product equals the cosine of the angle spanned
by both vectors.
The unit with the smallest distance or the largest scalar product is called the
winner. Its output is set to one, while the outputs of all other units are set to zero. This
operation requires global information about the similarities. It can be implemented
by lateral inhibition.
Competitive learning [201] in WTA networks can be achieved by adapting only
the weight vector w k of the winning unit k . One possibility is to add a fraction of
the current data vector: w k = η x i , where η is a learning rate that decreases over
time, followed by a renormalization of the weight length: w k w k / k w k k . This
leads to a redistribution of weight strengths from the weights connecting to inactive
input components to the weights connecting to active inputs. Hence, the unit will
respond more strongly if the same input is presented again. The weight vectors of
the units loosing the competition remain unchanged.
It is also possible to use the difference between the weight vector and the input
to make the weights similar to the inputs: w k = η ( x i w k ) . If the network has
a topological structure, such as in the SOM, neighboring units can be moved in the
same direction to produce a topological feature map.
Principal Component Analysis. One of the key ingredients of unsupervised learn-
ing methods are Hebbian [91] weight updates: ∆w i = ηx i y , where x i is the activity
of the presynaptic unit and y =
P
i w i x i = w · x is the activity of the postsy-
naptic unit. The Hebbian learning rule can be stated informally as: 'Neurons that
fire together - wire together.' If the inputs are zero mean, it captures the correlations
Search WWH ::




Custom Search