Digital Signal Processing Reference
In-Depth Information
critical component of the training of the MLP is the number of neurons
in the hidden layer. Too many neurons result in overlearning, and too
few impair the generalization property of the MLP.
The complexity of the MLP is determined by the number of its
adaptable parameters such as weights and biases. The goal of each
classification problem is to achieve optimal complexity.
In general, complexity can be influenced by (1) data preprocessing
such as feature selection/extraction or reduction, (2) training schemes
such as cross validation and early stopping, and (3) network structure
achieved through modular networks comprising multiple networks.
The cross validation technique is usually employed when we aim at
a good generalization in terms of the optimal number of hidden neurons
and when the training has to be stopped. Cross validation is achieved
by dividing the training set into two disjoint sets. The first set is used
for learning, and the latter is used for checking the classification error
as long as there is an improvement of this error. Thus, cross validation
becomes an effective procedure for detecting overfitting.
In general, the best generalization is achieved when three disjoint
data sets are used: a training, a validation and a testing set. While
the first two sets avoid overfitting, the latter is used to show a good
classification.
Modular networks
Modular networks represent an important class of connectionist archi-
tectures and implement the principle of divide and conquer: a complex
task (classification problem) is achieved collectively by a mixture of ex-
perts (hierarchy of neural networks). Mathematically, they belong to the
group of universal approximators. Their architecture has two main com-
ponents: expert networks and a gating network. The idea of the commit-
tee machine was first introduced by Nilsson [186]. The most important
modular networks types are shown below.
Mixture of experts : The architecture is based on experts and a sin-
gle gating network that yields a nonlinear function of the individual
responses of the experts.
Hierarchical mixture of experts : This comprises several groups of
mixture of experts whose responses are evaluated by a gating network.
The architecture is a tree in which the gating networks sits at the
Search WWH ::




Custom Search