Digital Signal Processing Reference
In-Depth Information
where
C
x
¼WLW
T
represents the EVD of
C
x
with
W
an orthogonal
n n
matrix and
L¼
Diag(
l
1
,
...
,
l
n
). This is a quite natural criterion for statistical estimation pur-
poses, even if the minimum variance property of the likelihood functional is actually
an asymptotic property. To deduce an adaptive algorithm, a gradient ascent procedure
has been proposed in [18] in which a new data
x
(
k
) is used at each time iteration
k
of the
maximization of (4.64). Using the differential of
L
(
W
,
L
) defined on the manifold of
n n
orthogonal matrices [see [21, pp. 62-63] or Exercise 4.15 (4.93)], we obtain the
following gradient of
L
(
W
,
L
)
r
W
L ¼W
[
L
1
y
(
k
)
y
T
(
k
)
y
(
k
)
y
T
(
k
)
L
1
]
r
L
L ¼L
1
þL
2
Diag[
y
(
k
)
y
T
(
k
)]
def
W
T
x
(
k
). Then, the stochastic gradient update of
W
yields
where
y
(
k
)
¼
W
(
k þ
1)
¼W
(
k
)
þm
k
W
(
k
)[
L
1
(
k
)
y
(
k
)
y
T
(
k
)
y
(
k
)
y
T
(
k
)
L
1
(
k
)]
(4
:
65)
L
(
k þ
1)
¼L
(
k
)
þm
0
k
[
L
2
(
k
)Diag[
y
(
k
)
y
T
(
k
)]
L
1
(
k
)]
(4
:
66)
where the stepsizes
m
k
and
m
0
k
are possibly different. We note that, starting from an
orthonormal matrix
W
(0), the sequence of estimates
W
(
k
) given by (4.65) is orthonor-
mal up to the second-order term in
m
k
only. To ensure in practice the convergence of
this algorithm, is has been shown in [18] that it is necessary to orthonormalize
W
(
k
)
quite often to compensate for the orthonormality drift in
O
(
m
k
). Using continuous-
time system theory and differential geometry [21], a modification of (4.65) has been
proposed in [18]. It is clear that
7
W
L
is tangent to the curve defined by
W
(
t
)
¼W
(0) exp [
t
(
L
1
y
(
k
)
y
T
(
k
)
y
(
k
)
y
T
(
k
)
L
1
)]
for
t ¼
0, where the matrix exponential is defined, for example, in [35, Chap. 11].
Furthermore, we note that this curve lies in the manifold of orthogonal matrices
if
W
(0) is orthogonal because exp(
A
) is orthogonal if and only if
A
is skew-
symmetric (
A
T
¼
2
A
) and matrix
L
1
y
(
k
)
y
T
(
k
)
y
(
k
)
y
T
(
k
)
L
1
is clearly skew-
symmetric. Moving on the curve
W
(
t
) from point
t ¼
0 in the direction of increasing
values of
7
W
L
amounts to letting
t
increase. Thus, a discretized version of the
optimization of
L
(
W
,
L
) as a continuous function of
W
is given by the following
update scheme
W
(
k þ
1)
¼W
(
k
) exp{
m
k
[
L
1
(
k
)
y
(
k
)
y
T
(
k
)
y
(
k
)
y
T
(
k
)
L
1
(
k
)]}
(4
:
67)
and the coupled update equations (4.66) and (4.67) form the MALASE algorithm. As
mentioned above the update factor exp{
m
k
[
L
1
(
k
)
y
(
k
)
y
T
(
k
)
y
(
k
)
y
T
(
k
)
L
1
(
k
)]} is
an orthogonal matrix. This ensures that the orthonormality property is preserved
by the MALASE algorithm, provided that the algorithm is initialized with an ortho-
gonal matrix
W
(0). However, it has been shown by the numerical experiments
Search WWH ::
Custom Search