Neural Networks Approach - Computational Intelligence in Time Series Forecasting

Information Technology Reference

In-Depth Information

given mapping, so that its relevancy to neural networks was not directly evident.

There were even opposite views to the relevance: one opposing the relevancy

(Girosi and Poggio.1989) and another in favour of it. However, it was the

refinement of the theorem by Sprecher (1965) that motivated Hecht-Nielsen

(1987b) to point out this reliance. He also proposed that the k th processing

elements of the hidden layer should have the activation function

n

z

OM

k

(

x

H

k

)

k

,

¦

k

i

1

where the real constant Ȝ and the monotonously increasing real continuous function

ȥ depend on n , but are independent of f . Furthermore, the rational constant İ should

satisfy the conditions of the Sprecher theorem 0 < İ < į , į > 0. The activation

function of the output layer units should be

21

n

y

¦

g

()

z

,

j

k

1

where g are the real and continuous functions depending on ĳ and İ .

Consequently, as it was shown (Hecht-Nielsen, 1987b), the Kolmogorov's theorem

can be implemented exactly by a three-layer feedforward neural network having n

input elements in the input layer, (2 n+ 1) processing elements in the hidden layer,

and m processing elements in the output layer. This confirms the statement that

even a single hidden-layer network is sufficient to reveal all the characteristic

features present on the input nodes of the network. Introducing additional hidden

layers increases the feature extraction capability of the network at the cost of the

significantly extended training and operational time of the forecaster.

Lippmann (1987), in his celebrated paper on neurocomputing, stated clearly

that a three-layer perceptron can form arbitrarily complex decision regions and can

separate meshed classes, which means that no more than three network layers are

needed in perceptron-like feedforward nets. This particularly holds for the

networks with one output, as required for one-step-ahead forecasting. Cybenko

(1989), finally underlined that the networks never need more than two hidden

layers to solve most complex problems. Also, the investigation of neural network

capabilities related to their internal structure has proven that two-hidden-layer

networks are more prone to fall into bad local minima. DeVilliers and Barnard

(1992) even pointed out that both the one- and two-hidden-layer networks perform

similarly in all other respects. This can be understood from the comparison of

complexity degree of two investigated networks measured by the Vapmik-

Chervonenkis dimension , as was done by Baum and Hausler (1989).

We now turn to the problem of the number of hidden neurons placed within

the hidden layer. To determine the optimal number of hidden neurons there is no

straight-forward methodology, but some rules of thumb and some suggestions how

to do this have been proposed. For instance, in single-hidden-layer networks, it is

recommended to take the number of hidden-layer neurons in the neighbourhood of

75% of the number of network inputs, or say between 0.5 and 3 times the number

Computational Intelligence in Time Series Forecasting

Search WWH ::

Custom Search

Home