Neural Networks Approach - Computational Intelligence in Time Series Forecasting

Information Technology Reference

In-Depth Information

method reduces the rank of weights in each layer by deletion of the smallest salient

eigen-nodes. Finally, the proposed method does not require network training.

A network pruning approach is preferably used in designing networks with a

high generalization capability , i.e. networks that are not only good enough to solve

the prediction or classification problems present in the training set, but also some

similar problems using some fresh, never seen and not previously known training

sets of data. This is achieved through a trade-off between the intention that the

trained network should be capable of learning a broad spectrum of similar problem

categories, which would require a large-sized network, and the requirement that the

network should be as simple as possible, in order to avoid the overtraining .

In practical application of a trained network, there is a fundamental

recommendation, i.e . where several trained networks have approximately the same

final performances, the structurally simplest network should be selected as the best

generalized one. This recommendation reflects Occam's razor philosophy , which

recommends that a scientific model should favour simplicity.

Many training strategies have been interrogated for network simplification at

lower training cost. Such strategies have been discovered within the framework of

minimization of the error function extended by a penalty term. To this category of

strategies belong:

x the weight decay approach (Hinton, 1989), a subset of regularization

approaches based on minimization of the weight tuning rule augmented by

a complexity penalty term

'

wt

( )

SG

x

O

w

ij

i

j

ij

that penalizes the large weight values.

x the weight elimination approach (Weigend et al. , 1991), based on

minimization of network training cost function to which a term is added

that accounts for the number of parameters:

wt

()

ij

'

wt

( )

KG

x

O

,

ij

i

2

[1

wt

( )]

ij

x is

where Ȝ represents the weight decay constant,

G is the local error,

the local activation, and Ș is the learning rate.

In contrast to weight decay, which shrinks large values of weights more than small

ones, the weight elimination shrinks predominantly the small weight values and is

to a certain degree similar to the pruning process. Hansen and Rasmussen (1994)

have demonstrated that network pruning may result when the weight decay

parameter is determined by data. The added term punishes the large weight values

and forces them to obtain small absolute values and simultaneously retains the

other values unchanged. This, however, is favourable in preventing worsening of

Computational Intelligence in Time Series Forecasting

Search WWH ::

Custom Search

Home