Information Technology Reference
In-Depth Information
Bias: systematic component of approximation error, measuring how close the av-
erage classifier produced by the learning algorithm will be to the target function;
Variance: sensitivity of approximation to the finite size of the training sample,
measuring how much each of the learning algorithm's guesses will vary with
respect to each other;
Intrinsic target noise: due to the noise of the training sample, measuring the
minimum classification error associated with the Bayes optimal classifier for the
target function.
A tradeoff between the bias component and the variance component of the sys-
tem must always be made. While reducing the degrees of freedom of a learning
machine lowers the variance error, the restriction of possible solutions increases the
bias error. For this reason, it is important to ensure that a restricted learning system
is still appropriate for the task to be learned. In the context of the Neural Abstraction
Pyramid architecture, such restrictions include hierarchical network structure, local
connectivity, and weight sharing. In Chapter 4 it was discussed, why such restric-
tions are appropriate for image interpretation tasks.
In general, it is hard to assess the generalization of a learning system using
the training set alone. For this reason, one may hold back some examples of the
dataset from training to test generalization [238]. One way to restrict the degrees
of freedom of a learning system trained with an incremental algorithm is to use
early stopping [179]. This terminates the training if the performance on a test set
starts to degrade. Another way to assess generalization is by cross-validation [205]
which uses different subsets of the training set to train multiple classifiers. If ran-
dom subsamples of the training set are used instead of subsets, the method is called
bootstrapping [60].
6.2 Feed-Forward Neural Networks
Artificial neural networks are popular tools for supervised learning. Feed-forward
neural networks (FFNN) compute complex functions in directed acyclic graphs of
primitive functions. Usually, the nodes of the graph are arranged in layers. The prim-
itive functions computed by the nodes access other nodes via weighted links. For ex-
ample,
P
-units compute a weighted sum of their inputs. This sum is passed through
a transfer function that may be non-linear.
The first trainable neural network was proposed in 1958 by Rosenblatt [194].
The classical perceptron is a multilayer network that could be trained to recognize
patterns on a retina. The processing units, perceptrons, computed weighted sums of
their inputs, followed by a threshold nonlinearity.
The model was simplified and analyzed by Minsky and Papert [158]. They
showed that the perceptron learning algorithm can solve linearly separable prob-
lems, but cannot learn all possible boolean functions. This result applies to any
feed-forward neural networks without hidden units. For example, the XOR function
Search WWH ::




Custom Search