Information Technology Reference
In-Depth Information
Murata et al. (1994) used this generalization to determine the number of hidden
units required to mimic the system based on input-output examples only. Attention
was paid to avoiding possible network overfitting by taking a small number of
redundant hidden neurons. A large number of hidden layer neurons could, for the
given training example, deliver better learning results but, due to the increased
network complexity, for some fresh examples could deliver worse results.
What the interconnections of network nodes concerns, full interconnection is
recommended for initial network configuration, in which the output of each neuron
of a layer is connected with the input of each neuron of the subsequent layer.
However, in some applications, deviations from full interconnection have also been
successful.
For activation function selection , there is generally no rich choice left. For
backpropagation networks, mostly the
x sigmoid function
1
1e x
y
is selected as an activation function in numerous applications, including
time series forecasting. But in some applications the
x hyperbolic tangent function
ee
ee
x
x
y
,
x
x
has also been used successfully, for instance when solving the problems
that rely on learning of deviations from average behaviour (Klimasauskas,
1991)
x step and ramp function are some additional alternatives favourable for
processing binary variables.
In any case, to avoid functional destruction of the neuron, the function selected
should be limited at its output, usually between the values -1 and +1. Although
there are no guidelines for selecting the activation functions in individual network
layers and for distributing them within the layers, it is still best to build
homogeneous individual layers and for the hidden neurons possibly to use the
sigmoid activation function . But still, some researchers have successfully used the
hyperbolic tangent as an activation function of hidden-layer neurons. Very seldom
heterogeneous network layers have been used. For time series forecasting, the
general experience has shown that for output neurons the linear activation function
delivers the best results. Some theoretical evidence for this has also been given
(Rumelhart et al. , 1986). It was shown that only for forecasting of time series with
trend, output neurons with a nonlinear activation function are required.
 
Search WWH ::




Custom Search