Digital Signal Processing Reference
In-Depth Information
algorithm. A very small absolute value of
μ
theoretically guarantees monotonic con-
vergence to a local stationary point of the contrast, but convergence may be too slow
since the update also takes very small steps in the gradient direction. To speed up
the algorithm,
μ
may be increased, but then the algorithm may oscillate around a lo-
cal extremum without settling down, a phenomenon known as
misadjustment
[
37
],
or even risk divergence. This speed-quality trade-off is common to both batch (op-
erating over a signal block) and adaptive (stochastic, recursive, sample-by-sample)
implementations of gradient-based optimization algorithms [
1
,
2
,
4
].
Many works in the literature have been devoted to making an optimal, or at least
judicious, choice of the step size, aiming at fast convergence with low misadjust-
ment. A classical approach consists in starting the iterations with a large value of
μ
and then decreasing it progressively as the algorithm converges. However, detecting
whether the algorithm approaches the right solution is a problem far from trivial,
which often hinders the performance of this simple idea. Newton updates are in the-
ory parameter-free, yet, as seen earlier in the chapter for the FastICA algorithm, they
can actually reduce to constant step-size gradient iterations. A fact long unnoticed
but pointed out in [
20
,
21
] was recently developed in [
71
,
74
,
76
]: the shape of the
kurtosis contrast enables the closed-form computation of the optimal step-size value
at each extracting vector update. The resulting algorithm is described next.
9.4.5.2 Algebraic Exact Line Search: The RobustICA Algorithm
A simple approach to addressing the trade-off set by the learning coefficient in gra-
dient algorithms is exact line search. This optimization technique aims at the step
size leading to the
global
optimum of the contrast along the current search direction:
μ
opt
=
arg max
μ
J
κ
(
w
+
μ
d
)
(9.30)
where
d
typically represents the gradient vector at
w
, or any other suitably chosen
direction. In most cases, this one-dimensional optimization is costly, as it usually
requires iterative numerical methods [
52
]. However, when the contrast is a ratio-
nal function or a polynomial, the search for the optimal step size can be notably
simplified [
20
,
21
].
Although this property is satisfied by most functions based on fourth-order statis-
tics, using the full version of the kurtosis contrast (
9.9
) presents some attractive
advantages relative to simplified versions such as the fourth-order moment [
74
]:
•
The kurtosis is a valid source extraction contrast even if prewhitening is not
performed [
64
]. Avoiding prewhitening prevents the performance limitations im-
posed by this second-order processing step [
8
].
•
The kurtosis is a valid contrast in both real- and complex-valued mixture sce-
narios, so that both cases can be treated without any modification. Both types of
sources can appear simultaneously in a given mixture, and the mixing matrix en-
tries can also be real or complex. Complex sources do need not to be circularly
distributed.
Search WWH ::
Custom Search