Information Technology Reference
In-Depth Information
the optimal combination like the nonlinear combinations are. This can be
demonstrated on the following example.
Suppose that k different forecast models are available and the i th individual
forecast has an information set { I i : I c , I i }, where I c is the common part of
information used by all k models and I i is the specific information for the i th
forecast only. Denoting the i th forecast by f i = F i ( I i ), we can express the linear
combination of forecasts as
F c =6 w i F i ( I i ),
where w i is the weight of the i th forecast. On the other hand, every individual
forecasting model can also be regarded as a subsystem for information processing,
while the combination model f c = F c ( I 1 , I 2 , ..., I k ) is regarded as such a system. It
follows that the integration of forecasts is more than their sum, i.e. the performance
of the integrated system is more than the sum of its subsystems. So, the
trustworthiness of the linear forecast combination is quite questionable. More trust
should be paid to a nonlinear interrelation between the individual forecasts, such as
f c = \> F 1 ( I 1 ), F 2 ( I 2 ), F 3 ( I 3 ), ..., F k ( I k )@
where \ is a nonlinear function. While the given information is processed by
individual forecasting models, it is likely that parts of the entire information can be
lost, which means that, say, the information set I i is not being used efficiently.
Furthermore, different forecasts may have different parts of information lost. This
is why it is preferable that as many different forecasts as possible should be present
in the combination, even when the individual forecasts depend on the same set of
information.
As a forecasting example (Palit and Popovic, 2000), a 2-6-6-1 feedforward
network, i.e. a network with two inputs, and two hidden layers with each layer
containing six neurons and one output, is used, as shown in Figure 3.19b. The
network is trained using the Levenberg-Marquardt algorithm, which guarantees
much faster learning speed than the standard backpropagation method, and hence
requires less training time. The algorithm also uses the gradient descent method,
based on Jacobian matrix, according to which the update is
1
T
ª
º
T
'
w
()()
w J w
P
I
()()
w e x
J
J
¬
¼
or
wk
(
'
)
wk
( )
wk
( )
1
ª
T
º
wk
(
)
wk
( )
(
wJ w
)
(
)
P
I
T
(
wew
) (
)
J
J
¬
¼
where J ( w ) is the Jacobian matrix with respect to network-adjustable parameters w
(all weights and the biases) of dimension ( q×N p ), and q being the number of
Search WWH ::




Custom Search