Neural Networks Approach - Computational Intelligence in Time Series Forecasting

Information Technology Reference

In-Depth Information

remedy, an increase in the normalization of weights at every iteration step is

necessary. Oja (1982) proposed using for this the normalization relationship

wt

()

K

x tyt

() ()

wt

( )

i

,

i

[()

wt

xtyt

()( ]

2

¦

i

derived through modification of the Hebbian rule itself. The modification

normalizes the weight vector size to the value 1 by decreasing the values of all

other weight vectors if one of its components increases, in this way keeping the

total length of the vector constant.

The above rule modification can, for a small value of Ș and after power

expansion, be approximated as

wt

(

#

1)

wt

()

K

yt x

()[

ytwt

()

()]

,

i

which is known as Oja's rule.

Yet, the fact that the application of the Hebbian rule is considerably limited to

single-layer neural networks, the original version of the backpropagation

algorithm is favoured for training of multilayer networks. The training is

performed off-line in a supervisory learning mode, which is convenient because, in

practice, a large number of data are available that have to be processed prior to

their application for training. Besides, for forecasting purposes the pairs of related

input and output data also have to be built and processed. Finally, the supervisory

mode of learning facilitates the implementation of monitoring of training

performance and the determination of the training stopping point.

When applying the backpropagation algorithm, which is a typical gradient

steepest descent method, decisions have to be made concerning the

x learning rate , i.e . the step size or the magnitude of weight updating

x momentum , which is required for escaping the trapping in local minima.

An appropriate selection of learning rate is particularly important because the

steepest descent method suffers from slow convergence and weak robustness.

Convergence acceleration by taking a larger learning rate bears the danger of

network oscillatory behaviour around the minimum. To avoid this, and still to take

a larger learning rate, addition of a momentum parameter was recommended

(Rumelhart et al. , 1986). By doing this, the original learning step according to the

delta rule

wt

( ) ()

wt

KH

()()

t x t

p

is extended by the momentum term to result in

wt

( )

wt

()

KG

()() [()

txt

D

wt wt

,

( ]

ij

i

j

ij

Computational Intelligence in Time Series Forecasting

Search WWH ::

Custom Search

Home