Unsupervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

The centroids µ j are initialized arbitrarily, e.g. to randomly chosen examples. Each

iteration consists of two steps:

• Step1: Assign the data vectors to the closest centroid: c i =

k x i − µ j k .

• Step2: Move the centroids to the mean of the assigned examples: µ j = h x i i

argmin

1 ≤ j ≤ K

c i = j .

In a probabilistic framework, the K -means algorithm is a special case of the

expectation-maximization (EM) algorithm [52] applied to mixture density estima-

tion. Since each step decreases the quantization error until the assignment does not

change any more, the K -means algorithm finds a local minimum of the error func-

tion 5.1. The quality of the approximation depends on the number of centroids and

on the initialization. One can start with a low number of clusters that are split re-

cursively to avoid initialization problems and to determine the number of clusters

needed.

Competitive Learning. The algorithm described above can be viewed as compe-

tition between the centroids to respond to the data vectors. Similar competition is

achieved in winner-takes-all (WTA) networks. These neural networks consist of an

input layer and a layer of laterally connected processing elements which assess the

similarity between the current data vector x i and their weight vector w j . The assess-

ment can be done using a distance function d ( x i , w j ) , as in self organizing maps

(SOM) [126], or by computing the scalar product x i · w j . If the weight vectors and

the inputs are normalized, the scalar product equals the cosine of the angle spanned

by both vectors.

The unit with the smallest distance or the largest scalar product is called the

winner. Its output is set to one, while the outputs of all other units are set to zero. This

operation requires global information about the similarities. It can be implemented

by lateral inhibition.

Competitive learning [201] in WTA networks can be achieved by adapting only

the weight vector w k of the winning unit k . One possibility is to add a fraction of

the current data vector: ∆ w k = η x i , where η is a learning rate that decreases over

time, followed by a renormalization of the weight length: w k ← w k / k w k k . This

leads to a redistribution of weight strengths from the weights connecting to inactive

input components to the weights connecting to active inputs. Hence, the unit will

respond more strongly if the same input is presented again. The weight vectors of

the units loosing the competition remain unchanged.

It is also possible to use the difference between the weight vector and the input

to make the weights similar to the inputs: ∆ w k = η ( x i − w k ) . If the network has

a topological structure, such as in the SOM, neighboring units can be moved in the

same direction to produce a topological feature map.

Principal Component Analysis. One of the key ingredients of unsupervised learn-

ing methods are Hebbian [91] weight updates: ∆w i = ηx i y , where x i is the activity

of the presynaptic unit and y =

P

i w i x i = w · x is the activity of the postsy-

naptic unit. The Hebbian learning rule can be stated informally as: 'Neurons that

fire together - wire together.' If the inputs are zero mean, it captures the correlations

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home