Information Technology Reference
In-Depth Information
Numerical optimization methods, e.g. gradient descent or the fixed-point algorithm
called FastICA [106], are employed to estimate W .
Other Unsupervised Learning Techniques. Because the goals of unsupervised
learning can vary greatly, there exist many different unsupervised learning tech-
niques that have not been discussed so far.
One example is slow feature analysis (SFA), recently proposed by Wiskott and
Sejnowski [244]. This method focuses on finding representations that change only
slowly as input examples undergo a transformation. SFA expands the input sig-
nal non-linearly and applies PCA to this expanded signal and its time derivative.
The components with the lowest variance are selected as slow features. Tempo-
ral smoothing of the network's output is also the basis of the method proposed by
Foldiak [69] for the learning of invariant features.
Another example of unsupervised techniques is the learning of sparse features.
Sparse representations can be viewed as generalization to the local representations
generated by WTA networks. While in local representations exactly one unit is ac-
tive, in sparse representations multiple units can be active, but the ratio between
the active and the inactive units is low. This increases the representational power
of the code, facilitates generalization, allows for controlled inference, increases the
capacity of associative memories, implements fault tolerance, and allows for the
simultaneous representation of multiple items by superposition of individual encod-
ings [70]. There is substantial evidence that the human visual system utilizes sparse
coding to represent properties of visual scenes [215].
A simple local unsupervised algorithm for learning such representations in a
nonlinear neural network was proposed by Foldiak [68]. It uses Hebbian forward
connections to detect non-accidental features, an adaptive threshold to keep the ac-
tivity ratio low, and anti-Hebbian decorrelating lateral connections to keep redun-
dancy low. It produces codes with few active units for frequent patterns, while less
probable patterns are encoded using a higher number of active units.
Other algorithms for the learning of sparse features adjust connection weights
by explicitly maximizing measures of sparseness, successfully producing V1 sim-
ple cell-like features [170]. This class of algorithms is closely related to ICA since
sparse distributions are also non-Gaussian.
Beyond sparseness, another interesting property of a representation is the inter-
pretability of encodings. While a randomly chosen codeword could only signal the
presence of an item, Barlow [15] suggested that the cortex might use sparse codes
where the individual units signal the presence of meaningful features in the input.
In this scheme, items are encoded by combinations of features.
In the following section, I introduce an unsupervised learning algorithm for
the forward projections of the Neural Abstraction Pyramid. It is based on Hebbian
weight updates and lateral competition and yields a sequence of more and more ab-
stract representations. With increasing height, the spatial resolution of feature arrays
decreases, feature diversity increases and the representations become increasingly
sparse.
Search WWH ::




Custom Search