Digital Signal Processing Reference
In-Depth Information
often turns out that the algorithms can be improved by using a measure
that takes even higher order moments into account. Such a measure can,
for example, be the negentropy, defined in definition 3.19 to be
J ( y ):= H ( y gauss )
H ( y ) .
As seen in section 3.3, the negentropy can indeed be used to measure
deviation from the Gaussian. The smaller the negentropy, the ”less
Gaussian” the random variable.
Algorithm: ( negentropy minimization ) Minimize w
J ( w z )on
S n−1 after whitening.
We can assume that the random variable y has unit variance, so we
get
J ( y ):= 1
2 (1 + log 2 π )
H ( y ) .
Hence negentropy minimization equals entropy maximization.
In order to see a connection between the two Gaussianity measures
kurtosis and negentropy, Taylor expansion of the negentropy can be used
to get the approximation from equation (3.1):
J ( y )= 1
12 E ( y 3 ) 2 + 1
48 kurt( y ) 2 + ....
If we assume that the third-order moments of y vanish (for exampl,e
for symmetric sources), we see that kurtosis maximization indeed cor-
responds to a first approximation of the more general negentropy mini-
mization.
Other versions of gradient-ascent and fixed-point algorithms can now
easily be developed by using more general approximations [120] of the
negentropy.
Estimation of more than one component
So far we have estimated only one independent component (i.e. one row
of W ). How can the above algorithm be used to estimate the whole
matrix?
By prewhitening W
O ( n ), so the rows of the whitened demixing
mapping W are mutually orthogonal. The way to get the whole matrix
W using the above non-Gaussianity maximization is to iteratively search
components as follows.
Algorithm: ( deflation FastICA algorithm ) Perform fixed-point kurto-
Search WWH ::




Custom Search