Digital Signal Processing Reference
In-Depth Information
where C x ¼WLW T represents the EVD of C x with W an orthogonal n n matrix and
Diag( l 1 , ... , l n ). This is a quite natural criterion for statistical estimation pur-
poses, even if the minimum variance property of the likelihood functional is actually
an asymptotic property. To deduce an adaptive algorithm, a gradient ascent procedure
has been proposed in [18] in which a new data x ( k ) is used at each time iteration k of the
maximization of (4.64). Using the differential of L ( W , L ) defined on the manifold of
n n orthogonal matrices [see [21, pp. 62-63] or Exercise 4.15 (4.93)], we obtain the
following gradient of L ( W , L )
r W L ¼W [ L 1 y ( k ) y T ( k ) y ( k ) y T ( k ) L 1 ]
r L L ¼L 1
þL 2 Diag[ y ( k ) y T ( k )]
def
W T x ( k ). Then, the stochastic gradient update of W yields
where y ( k ) ¼
W ( k þ 1) ¼W ( k ) þm k W ( k )[ L 1 ( k ) y ( k ) y T ( k ) y ( k ) y T ( k ) L 1 ( k )]
(4 : 65)
L ( k þ 1) ¼L ( k ) þm 0 k [ L 2 ( k )Diag[ y ( k ) y T ( k )] L 1 ( k )]
(4 : 66)
where the stepsizes m k and m 0 k are possibly different. We note that, starting from an
orthonormal matrix W (0), the sequence of estimates W ( k ) given by (4.65) is orthonor-
mal up to the second-order term in m k only. To ensure in practice the convergence of
this algorithm, is has been shown in [18] that it is necessary to orthonormalize W ( k )
quite often to compensate for the orthonormality drift in O ( m k ). Using continuous-
time system theory and differential geometry [21], a modification of (4.65) has been
proposed in [18]. It is clear that 7 W L is tangent to the curve defined by
W ( t ) ¼W (0) exp [ t ( L 1 y ( k ) y T ( k ) y ( k ) y T ( k ) L 1 )]
for t ¼ 0, where the matrix exponential is defined, for example, in [35, Chap. 11].
Furthermore, we note that this curve lies in the manifold of orthogonal matrices
if W (0) is orthogonal because exp( A ) is orthogonal if and only if A is skew-
symmetric ( A T
¼ 2 A ) and matrix L 1 y ( k ) y T ( k ) y ( k ) y T ( k ) L 1 is clearly skew-
symmetric. Moving on the curve W ( t ) from point t ¼ 0 in the direction of increasing
values of 7 W L amounts to letting t increase. Thus, a discretized version of the
optimization of L ( W , L ) as a continuous function of W is given by the following
update scheme
W ( k þ 1) ¼W ( k ) exp{ m k [ L 1 ( k ) y ( k ) y T ( k ) y ( k ) y T ( k ) L 1 ( k )]}
(4 : 67)
and the coupled update equations (4.66) and (4.67) form the MALASE algorithm. As
mentioned above the update factor exp{ m k [ L 1 ( k ) y ( k ) y T ( k ) y ( k ) y T ( k ) L 1 ( k )]} is
an orthogonal matrix. This ensures that the orthonormality property is preserved
by the MALASE algorithm, provided that the algorithm is initialized with an ortho-
gonal matrix W (0). However, it has been shown by the numerical experiments
 
Search WWH ::




Custom Search