Supervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

• Bias: systematic component of approximation error, measuring how close the av-

erage classifier produced by the learning algorithm will be to the target function;

• Variance: sensitivity of approximation to the finite size of the training sample,

measuring how much each of the learning algorithm's guesses will vary with

respect to each other;

• Intrinsic target noise: due to the noise of the training sample, measuring the

minimum classification error associated with the Bayes optimal classifier for the

target function.

A tradeoff between the bias component and the variance component of the sys-

tem must always be made. While reducing the degrees of freedom of a learning

machine lowers the variance error, the restriction of possible solutions increases the

bias error. For this reason, it is important to ensure that a restricted learning system

is still appropriate for the task to be learned. In the context of the Neural Abstraction

Pyramid architecture, such restrictions include hierarchical network structure, local

connectivity, and weight sharing. In Chapter 4 it was discussed, why such restric-

tions are appropriate for image interpretation tasks.

In general, it is hard to assess the generalization of a learning system using

the training set alone. For this reason, one may hold back some examples of the

dataset from training to test generalization [238]. One way to restrict the degrees

of freedom of a learning system trained with an incremental algorithm is to use

early stopping [179]. This terminates the training if the performance on a test set

starts to degrade. Another way to assess generalization is by cross-validation [205]

which uses different subsets of the training set to train multiple classifiers. If ran-

dom subsamples of the training set are used instead of subsets, the method is called

bootstrapping [60].

6.2 Feed-Forward Neural Networks

Artificial neural networks are popular tools for supervised learning. Feed-forward

neural networks (FFNN) compute complex functions in directed acyclic graphs of

primitive functions. Usually, the nodes of the graph are arranged in layers. The prim-

itive functions computed by the nodes access other nodes via weighted links. For ex-

ample,

P

-units compute a weighted sum of their inputs. This sum is passed through

a transfer function that may be non-linear.

The first trainable neural network was proposed in 1958 by Rosenblatt [194].

The classical perceptron is a multilayer network that could be trained to recognize

patterns on a retina. The processing units, perceptrons, computed weighted sums of

their inputs, followed by a threshold nonlinearity.

The model was simplified and analyzed by Minsky and Papert [158]. They

showed that the perceptron learning algorithm can solve linearly separable prob-

lems, but cannot learn all possible boolean functions. This result applies to any

feed-forward neural networks without hidden units. For example, the XOR function

Search WWH ::

Custom Search

Home