Information Technology Reference
In-Depth Information
given mapping, so that its relevancy to neural networks was not directly evident.
There were even opposite views to the relevance: one opposing the relevancy
(Girosi and Poggio.1989) and another in favour of it. However, it was the
refinement of the theorem by Sprecher (1965) that motivated Hecht-Nielsen
(1987b) to point out this reliance. He also proposed that the k th processing
elements of the hidden layer should have the activation function
n
z
OM
k
(
x
H
k
)
k
,
¦
k
i
i
1
where the real constant Ȝ and the monotonously increasing real continuous function
ȥ depend on n , but are independent of f . Furthermore, the rational constant İ should
satisfy the conditions of the Sprecher theorem 0 < İ < į , į > 0. The activation
function of the output layer units should be
21
n
y
¦
g
()
z
,
j
j
k
k
1
where g are the real and continuous functions depending on ij and İ .
Consequently, as it was shown (Hecht-Nielsen, 1987b), the Kolmogorov's theorem
can be implemented exactly by a three-layer feedforward neural network having n
input elements in the input layer, (2 n+ 1) processing elements in the hidden layer,
and m processing elements in the output layer. This confirms the statement that
even a single hidden-layer network is sufficient to reveal all the characteristic
features present on the input nodes of the network. Introducing additional hidden
layers increases the feature extraction capability of the network at the cost of the
significantly extended training and operational time of the forecaster.
Lippmann (1987), in his celebrated paper on neurocomputing, stated clearly
that a three-layer perceptron can form arbitrarily complex decision regions and can
separate meshed classes, which means that no more than three network layers are
needed in perceptron-like feedforward nets. This particularly holds for the
networks with one output, as required for one-step-ahead forecasting. Cybenko
(1989), finally underlined that the networks never need more than two hidden
layers to solve most complex problems. Also, the investigation of neural network
capabilities related to their internal structure has proven that two-hidden-layer
networks are more prone to fall into bad local minima. DeVilliers and Barnard
(1992) even pointed out that both the one- and two-hidden-layer networks perform
similarly in all other respects. This can be understood from the comparison of
complexity degree of two investigated networks measured by the Vapmik-
Chervonenkis dimension , as was done by Baum and Hausler (1989).
We now turn to the problem of the number of hidden neurons placed within
the hidden layer. To determine the optimal number of hidden neurons there is no
straight-forward methodology, but some rules of thumb and some suggestions how
to do this have been proposed. For instance, in single-hidden-layer networks, it is
recommended to take the number of hidden-layer neurons in the neighbourhood of
75% of the number of network inputs, or say between 0.5 and 3 times the number
Search WWH ::




Custom Search