Digital Signal Processing Reference
In-Depth Information
often turns out that the algorithms can be improved by using a measure
that takes even higher order moments into account. Such a measure can,
for example, be the negentropy, defined in definition 3.19 to be
J
(
y
):=
H
(
y
gauss
)
−
H
(
y
)
.
As seen in section 3.3, the negentropy can indeed be used to measure
deviation from the Gaussian. The smaller the negentropy, the ”less
Gaussian” the random variable.
Algorithm:
(
negentropy minimization
) Minimize
w
J
(
w
z
)on
→
S
n−1
after whitening.
We can assume that the random variable
y
has unit variance, so we
get
J
(
y
):=
1
2
(1 + log 2
π
)
−
H
(
y
)
.
Hence negentropy minimization equals entropy maximization.
In order to see a connection between the two Gaussianity measures
kurtosis and negentropy, Taylor expansion of the negentropy can be used
to get the approximation from equation (3.1):
J
(
y
)=
1
12
E
(
y
3
)
2
+
1
48
kurt(
y
)
2
+
....
If we assume that the third-order moments of
y
vanish (for exampl,e
for symmetric sources), we see that kurtosis maximization indeed cor-
responds to a first approximation of the more general negentropy mini-
mization.
Other versions of gradient-ascent and fixed-point algorithms can now
easily be developed by using more general approximations [120] of the
negentropy.
Estimation of more than one component
So far we have estimated only one independent component (i.e. one row
of
W
). How can the above algorithm be used to estimate the whole
matrix?
By prewhitening
W
O
(
n
), so the rows of the whitened demixing
mapping
W
are mutually orthogonal. The way to get the whole matrix
W
using the above non-Gaussianity maximization is to iteratively search
components as follows.
Algorithm:
(
deflation FastICA algorithm
) Perform fixed-point kurto-
∈
Search WWH ::
Custom Search