Information Technology Reference
In-Depth Information
stated in Chap. 2. The measurement equation must be linearized to update the
covariance and to recursively compute the Kalman gain as described in the
previous section. Actually, the state evolution is just a random walk and the
covariance of noise is constant, so that the equations of the filter take a simpler
form.
If H ( k + 1) stands for the gradient of the network output g with respect
to the weight vector w at the point [ x ( k +1) , w ( k )], we get
P k +1 = P k + Q
K k +1 = P k +1 H ( k +1) T [ H ( k +1) P k +1 H ( k +1) T + R ] 1
P k +1 =[ I
K k +1 H ( k +1)] P k +1 [ I
K k +1 H ( k +1)] T + K k +1 RK T
k +1 ,
where Q and R are the classical notations for covariance matrices of the state
noise and of the measurement noise in Kalman filtering theory. The equation
of the filter is
w ( k +1)= w ( k ) K k +1 ϑ ( k +1) ,
with
g [ x ( k +1) , w ( k )] .
It should be emphasized that the neural network under identification is
a virtual object: the only existing configuration is the current configuration
w ( k ) under estimation. The ideal configuration that we try to identify or to
track has no actual existence; it is an approximate representation of the real
process.
The equation of the filter is a form of nonadaptive optimization algorithms
that was reviewed in Chap. 2. In that algorithm, the descent direction is not
the gradient of quadratic error that is equal to H ( k +1) T ϑ (k + 1). The gra-
dient may be computed using the backpropagation algorithm. Actually, the
Kalman filter training algorithm is a second-order method, but it is an adap-
tive method, by contrasts to the methods that were presented in Chap. 2. The
estimation of error surface curvature is performed by updating the covariance
matrices. The implementation problems are similar to other second-order al-
gorithms (inversion of a large matrix, positivity constraint) and are overcome
by similar algorithmic techniques.
In order to reduce the complexity of the covariance matrix update, a de-
coupled extended Kalman filter technique (DEKF) was proposed in the liter-
ature. The parameters are grouped into clusters. The clusters are supposed to
be uncorrelated. For instance a cluster may be the set of weights afferent to a
single neuron and the associated bias. Then the covariance matrix acquires a
block structure, so that it is easier to update and to invert ([Puskorius et al.
1994; Haykin 1999]).
The Kalman filter training method is not commonly used because it is
relatively complex to implement. Nevertheless, it is potentially very interest-
ing, because it is a second-order adaptive method. The choice of covariance
matrices may seem arbitrary. That can be used advantageously to express
ϑ ( k +1)= y ( k +1)
Search WWH ::




Custom Search