Information Technology Reference
In-Depth Information
the selection of architecture (generally, the size of the hidden layer).
For recurrent networks, three additional questions must be addressed:
the selection of the representation (input-output or state representation),
the choice of the model order,
the length of the sliding time window for backpropagation through time.
It may be quite useful to perform a linear identification (where structural
tests are better understood) to select the order of the model. The selection of
the truncation horizon for BPTT is also a tricky issue: theoretically the order
of observability of the model is su cient as a truncation horizon, practically
computing time limits the size of the sliding time window.
Another problem for recurrent network learning is to capture long-range
time dependency going backwards to the past. That issue is investigated
in [Bengio 1994]. Nevertheless, long-range time dependencies are seldom con-
sidered in practical applications because the true physical processes are not
steady along very long epochs: there exist slow drifts to cope with; adaptive
methods (that have been developed here) are then used to update the model.
When one is facing big di culties, it is advised to use directed and evolu-
tive learning strategy, progressively increasing the time-depth of the learning
process and using robust optimization methods. An e cient methodology for
practical applications is the use of “gray box” modeling (presented in Chap. 2)
to take advantage of all the available knowledge on the process to be modeled,
the mathematical form of the model equations, the order of the model and so
on. Thus, the designer has to address a smaller number of issues.
Of course data preprocessing, using first linear regression methods then
using the residues to feed nonlinear learning algorithms, often improves the
accuracy of nonlinear identification methods, since approximation problems
are correctly decoupled and scaled.
Recurrent neural networks may be used to design controllers. That ques-
tion will be addressed in the following chapter.
4.7 Appendix (Algorithms and Theoretical
Developments)
4.7.1 Computation of the Kalman Gain and Covariance
Propagation
Let us consider the Markov stochastic state model,
X ( k +1)= AX ( k )+ Bu ( k )+ V ( k +1) ,
with the following measurement equation:
Y ( k )= HX ( k )+ W ( k ) .
Search WWH ::




Custom Search