Neural Identification of Controlled Dynamical Systems and Recurrent Networks - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

provides a unified way of handling those models, whatever their special archi-

tecture may be (delay distribution, etc.). That state representation was called

canonical form and was fully described in Chap. 2.

Any recurrent neural network , however complex, has a minimal state rep-

resentation, called “canonical form.” The algorithms that are described in the

previous section may be applied to the canonical form in a straightforward

way.

In Chap. 2, the paragraph that is entitled “Canonical form of dynamical

models” and complementary sections address that issue. Several examples are

presented there to illustrate the approach.

4.6 Learning for Recurrent Networks

E. Sontag has proven in [Sontag 1996] that recurrent neural networks are uni-

versal approximators of controlled, observable, deterministic dynamical sys-

tems. Note that, just as Hornik's universal approximation theorem for func-

tion approximation, the present theorem is not constructive, and provides no

indication either on the architecture or on the learning algorithm.

The main problem with recurrent neural network learning using a descent

method (first order gradient method or second-order method) comes from

the time range of the consequences of changing a weight value. The influence

of a weight value on the cost function is not limited to the current time: it

propagates through the computing horizon, which is theoretically unbounded.

In a rigorous mathematical treatment, the computation of the gradient of

the cost function requires propagating the computation for each example on

the full computational horizon, compute the weight correction and iterate as

necessary. The training process for recurrent networks would be then a very

expensive procedure for very long training sequences. It would be di cult to

implement on real-time applications. Therefore, when recurrent architectures

for neural networks were suggested for dynamical system identification and

control, approximate solutions were used. The basic paper [Williams 1989]

presents an interesting approach.

When the state of the system is completely known because it is measured at

each time step, there is no particular problem. A teacher-forcing algorithm can

readily be implemented, although (see Chap. 2) that technique is appropriate

only in applications where the relevant uncertainty is modeled by a state

noise. That approach was shown to be poor when a measurement noise must

be taken into account, a very frequent situation in industrial applications.

In the general case, where the knowledge of the state of the process is

corrupted by a measurement noise, or is not fully measured, one must make

a choice between two approximations:

•

Either compute the true gradient with respect to the current weights but

change the cost function by truncating the computation period to a sliding

window: that is called back-propagation through time (BPTT)

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home