Digital Signal Processing Reference
In-Depth Information
function to optimize, most ICA algorithms can be equivalently restated in a “nat-
ural gradient” form (Amari 1999; Amari and Cardoso 1997). In such a setting,
the demixing matrix B is estimated iteratively: B ( t + 1)
B ( t )
+ μ B ( B ( t ) ). The
=
“natural gradient”
B at B is given by
I
( S ) S T B
1
N H
B ( B )
,
(9.8)
( S ) in equation (9.8) is
the so-called score function, which is closely related to the pdf of the sources
(Cichocki and Amari 2002; Amari and Cardoso 1997). Assuming that all the
sources are generated from the same joint pdf pdf S , the entries of
S is the estimate of S :
S
where
=
BY .Thematrix
H
( S ) are the
H
partial derivatives of the log-likelihood function
log(pdf S ( S ))
=−
( S )[ i
N s }×{
,...,
} .
H
,
l ]
,
( i
,
l )
∈{
1
,...,
1
N
(9.9)
S [ i
,
l ]
As expected, the way the demixing matrix (and thus the sources) is estimated
closely depends on the way the sources are modeled (from a statistical point
of view). For instance, separating platykurtic (distribution with negative kurto-
sis) or leptokurtic (distribution with positive kurtosis) sources will require com-
pletely different score functions. Even if ICA is shown by Amari and Cardoso
(1997) to be quite robust to so-called mismodeling, the choice of the score func-
tion is crucial with respect to the convergence (and rate of convergence) of ICA
algorithms. Some ICA-based techniques (Koldovsky et al. 2006) focus on adapt-
ing the popular FastICA algorithm to adjust the score function to the distribution
of the sources. They particularly focus on modeling sources whose distributions
belong to specific parametric classes of distributions such as generalized Gaus-
sian distribution (GGD).
Noisy ICA: Only a few works have investigated the problem of noisy ICA
(Davies 2004; Koldovsky and Tichavsky 2006). As pointed out by Davies (2004),
noise clearly degenerates the ICAmodel: it is not fully identifiable. In the case of
additive Gaussian noise, as stated in equation (9.2), using higher-order statistics
yields an effective estimate of the mixing matrix A
B 1 (higher-order cumu-
lants are indeed blind to additive Gaussian noise; this property does not hold for
non-Gaussian noise). But in the noisy ICA setting, applying the demixing matrix
to the data does not yield an effective estimate of the sources. Furthermore, most
ICA algorithms assume the mixing matrix A to be square. When there are more
observations than sources ( N c >
=
N s ), a dimension reduction step is first applied.
When noise perturbs the data, this subspace projection step can dramatically
deteriorate the performance of the separation stage.
In the following, we will introduce a new way of modeling the data so as to avoid
most of the aforementioned limitations of ICA.
9.2.5 Toward Sparsity
The seminal paper of Zibulevsky and Pearlmutter (2001) introduced sparsity as an
alternative to standard contrast functions in ICA. In their work, each source s i was
 
Search WWH ::




Custom Search