Information Technology Reference
In-Depth Information
C ij = h x i x j i between the input units: h w i = η Cw . The correlation matrix C can
be viewed as linear transformation of the weight vector. In the long run, the eigen-
vector e with the largest eigenvalue λ will dominate the weight change: Ce = λ e .
If the Hebbian rule is combined with normalization, the weights develop towards
the principal component of the data.
Generic Hebbian learning is unstable since the weights grow without limits. To
avoid this effect, Oja [169] proposed adding a weight-decay term to the update rule:
∆w i = ηy ( x i yw i ) . It implements a self-normalization of the weight vector.
The unit's output y represents the orthogonal projection of a data vector x onto the
weight vector w . Its variance
y 2
is maximized by the learning rule.
If more than the first principal component is desired, the reconstruction r =
y w T = wxw T of the data vector that is based on y can be subtracted from x to
produce new examples x 0 = x r , which can be analyzed by a second unit to
extract the second principal component. Another possibility is to extend the learn-
ing rule for a multi-unit network to w r = ηy r
P
x
as proposed
by Sanger [203]. The principal component analysis (PCA) network decorrelates its
outputs y k and hence removes linear dependencies from the input. Because the num-
ber of output units can be chosen smaller than the number of input components, the
linear PCA transformation can be used to reduce the dimensionality of the data with
minimal loss of variance.
Independent Component Analysis. Another unsupervised learning technique is
called independent component analysis (ICA) [115, 26]. Its goal is to find a linear
transformation of the data vectors x that produces a representation y = Wx with
components which are not only uncorrelated, but statistically independent. This is
motivated by the assumption that the data vectors have been produced as a linear
mixture x = As of independent sources s i . If this is the case, ICA can be used
to separate the sources by estimating an unmixing matrix W
s≤r y s w s
A 1 , a problem
=
known as blind source separation.
ICA is applicable if at most one of the sources has a Gaussian distribution. Prin-
cipal component analysis and whitening are usually required as preprocessing steps
to remove second order correlations from the data vectors. This discards information
about sign and amplitude of the sources.
Some ICA methods use the fact that if two sources s 1 and s 2 are independent,
then any nonlinear transformations g ( s 1 ) and h ( s 2 ) are uncorrelated. Thus, they
perform nonlinear decorrelation to separate the sources.
According to the central limit theorem, sums of nongaussian random vari-
ables are closer to a Gaussian than the original ones. This is used in ICA meth-
ods that maximize the non-gaussianity of the output components. To measure non-
gaussianity, cumulants of higher-order moments, such as the kurtosis, the normal-
ized form of the fourth central moment measuring the peakedness of a distribution,
are used.
Because the estimation principles discussed above use non-quadratic functions,
the computations needed usually cannot be expressed using simple linear algebra.
Search WWH ::




Custom Search