Unsupervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

C ij = h x i x j i between the input units: h ∆ w i = η Cw . The correlation matrix C can

be viewed as linear transformation of the weight vector. In the long run, the eigen-

vector e with the largest eigenvalue λ will dominate the weight change: Ce = λ e .

If the Hebbian rule is combined with normalization, the weights develop towards

the principal component of the data.

Generic Hebbian learning is unstable since the weights grow without limits. To

avoid this effect, Oja [169] proposed adding a weight-decay term to the update rule:

∆w i = ηy ( x i − yw i ) . It implements a self-normalization of the weight vector.

The unit's output y represents the orthogonal projection of a data vector x onto the

weight vector w . Its variance

y 2

is maximized by the learning rule.

If more than the first principal component is desired, the reconstruction r =

y w T = wxw T of the data vector that is based on y can be subtracted from x to

produce new examples x 0 = x − r , which can be analyzed by a second unit to

extract the second principal component. Another possibility is to extend the learn-

ing rule for a multi-unit network to ∆ w r = ηy r

P

x −

as proposed

by Sanger [203]. The principal component analysis (PCA) network decorrelates its

outputs y k and hence removes linear dependencies from the input. Because the num-

ber of output units can be chosen smaller than the number of input components, the

linear PCA transformation can be used to reduce the dimensionality of the data with

minimal loss of variance.

Independent Component Analysis. Another unsupervised learning technique is

called independent component analysis (ICA) [115, 26]. Its goal is to find a linear

transformation of the data vectors x that produces a representation y = Wx with

components which are not only uncorrelated, but statistically independent. This is

motivated by the assumption that the data vectors have been produced as a linear

mixture x = As of independent sources s i . If this is the case, ICA can be used

to separate the sources by estimating an unmixing matrix W

s≤r y s w s

A − 1 , a problem

=

known as blind source separation.

ICA is applicable if at most one of the sources has a Gaussian distribution. Prin-

cipal component analysis and whitening are usually required as preprocessing steps

to remove second order correlations from the data vectors. This discards information

about sign and amplitude of the sources.

Some ICA methods use the fact that if two sources s 1 and s 2 are independent,

then any nonlinear transformations g ( s 1 ) and h ( s 2 ) are uncorrelated. Thus, they

perform nonlinear decorrelation to separate the sources.

According to the central limit theorem, sums of nongaussian random vari-

ables are closer to a Gaussian than the original ones. This is used in ICA meth-

ods that maximize the non-gaussianity of the output components. To measure non-

gaussianity, cumulants of higher-order moments, such as the kurtosis, the normal-

ized form of the fourth central moment measuring the peakedness of a distribution,

are used.

Because the estimation principles discussed above use non-quadratic functions,

the computations needed usually cannot be expressed using simple linear algebra.

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home