Information Technology Reference
In-Depth Information
of network inputs. The geometric pyramid rule , on the other hand, suggests
assigning
N
D
N
u
N
,
h
i
o
hidden neurons to a single hidden layer, where N is the number of network
inputs, N the number of its outputs, and Į is multiplication factor the value of
which, depending on the complexity of the problem to be solved, should be
selected in the range 0.5 < Į <2. Baum and Haussler (1989) suggested the number
of neurons in the hidden layer be determined as
NE
u
tr
tol
N
d
,
h
NN
dp
o
where
is the number of training examples,
is the error tolerance,
is the
N
E
N
tr
tol
dp
number of data points per training example, and
N is the number of output
neurons.
Anyhow, the determination of the optimal number of hidden neurons involves
trial-and-error experimentation: starting with a number of neurons within the layer
to be decided - based on final accuracy of each learning process - to increase or
decrease the number of hidden neurons and to start a new learning process. In this
way the redundant hidden neurons can be deleted and the neurons needed for
optimal performance of the layer added. Here, both starting with a relatively large
or small number of neurons is possible, but starting with a large number of neurons
bears the risk of long-time computation and of getting trapped in local minima.
Khorasani and Weng (1994) have presented an approach to structural
adaptation of feedforward neural networks by neuron pruning, i.e . by addition and
deletion of hidden neurons based on the activity status of individual neurons during
the learning, measured by the variance of the neuron output signal and by the
strength of the backpropagated error. This is a proper indication of neuron activity
that helps decide which low-activity redundant neurons are to be deleted.
There is also a reliable way to determine the number of hidden neurons using
the Akaike's information criterion (AIC), originally defined as
AIC = (-2) ln(Maximum likelihood) + 2(number of adjusted parameters).
The criterion statistically evaluates the goodness of a model by combining the
evaluated mean squares error for training data and the number of parameters to be
estimated. Seen otherwise, AIC combines a measure of fit and the penalty term to
account for model complexity. Its potential application suitability for neural
networks model building was recognized by Kurita (1990) and Fogel (1991), who
reformulated the original form of the criterion (for statistically independent,
normally distributed output errors with zero mean and with constant variance) as
Search WWH ::




Custom Search