Neural Networks Approach - Computational Intelligence in Time Series Forecasting

Information Technology Reference

In-Depth Information

the optimal combination like the nonlinear combinations are. This can be

demonstrated on the following example.

Suppose that k different forecast models are available and the i th individual

forecast has an information set { I i : I c , I i }, where I c is the common part of

information used by all k models and I i is the specific information for the i th

forecast only. Denoting the i th forecast by f i = F i ( I i ), we can express the linear

combination of forecasts as

F c =6 w i F i ( I i ),

where w i is the weight of the i th forecast. On the other hand, every individual

forecasting model can also be regarded as a subsystem for information processing,

while the combination model f c = F c ( I 1 , I 2 , ..., I k ) is regarded as such a system. It

follows that the integration of forecasts is more than their sum, i.e. the performance

of the integrated system is more than the sum of its subsystems. So, the

trustworthiness of the linear forecast combination is quite questionable. More trust

should be paid to a nonlinear interrelation between the individual forecasts, such as

f c = \> F 1 ( I 1 ), F 2 ( I 2 ), F 3 ( I 3 ), ..., F k ( I k )@

where \ is a nonlinear function. While the given information is processed by

individual forecasting models, it is likely that parts of the entire information can be

lost, which means that, say, the information set I i is not being used efficiently.

Furthermore, different forecasts may have different parts of information lost. This

is why it is preferable that as many different forecasts as possible should be present

in the combination, even when the individual forecasts depend on the same set of

information.

As a forecasting example (Palit and Popovic, 2000), a 2-6-6-1 feedforward

network, i.e. a network with two inputs, and two hidden layers with each layer

containing six neurons and one output, is used, as shown in Figure 3.19b. The

network is trained using the Levenberg-Marquardt algorithm, which guarantees

much faster learning speed than the standard backpropagation method, and hence

requires less training time. The algorithm also uses the gradient descent method,

based on Jacobian matrix, according to which the update is

()()

w J w

()()

w e x

(

)

( )

(

)

( )

(

wJ w

)

(

)

(

wew

) (

)

where J ( w ) is the Jacobian matrix with respect to network-adjustable parameters w

(all weights and the biases) of dimension ( q×N p ), and q being the number of

Computational Intelligence in Time Series Forecasting

Search WWH ::

Custom Search

Home