Information Technology Reference
In-Depth Information
where N j is a small positive constant used for regularization. The minimal value of
this problem is called the first kernel canonical correlation.
The Kernel-ICA algorithm proceeds as follows. Given a set of data vectors
x 1 ; x 2 ; ... ; x N , and given a parameter matrix W, we set s i ¼ Wx i , for each i, and
thereby form a set of estimated source vectors s 1 ; s 2 ; ... ; s N .Them components of
these vectors yield a set of m Gram matrices, K 1 ; K 2 ; ... ; K m , and these Gram matrices
(which depend on W) define the contrast function C ð W Þ¼ IkFK1 ; ... ; Km
ð
Þ .
The ICA algorithm minimizes this function with respect to W.
The optimization technique used for Kernel-ICA is gradient descent (with line
search) on an almost-everywhere differentiable function C ð W Þ . The algorithm
converges to a local minimum of C ð W Þ for any starting point. However, the ICA
contrast functions have multiple local minima, and restarts are generally necessary
if we are to find the global optimum. Empirically, the number of restarts was found
to be small when the number of samples was sufficiently large so as to make the
problem well-defined [ 48 ].
2.4 ICA Mixture Modelling
ICAMM is proposed in the framework of pattern recognition, considering that the
observed data come from a mixture model and they can be categorized into several
mutually exclusive classes. ICAMM assumes the underlying process that gener-
ated observed data is composed by multiple ICA models (data of each class are
modelled as an ICA, i.e., linear combinations of independent non-gaussian sour-
ces). This modelling has been proposed in order to deal with the problems of the
widely used mixture of Gaussians (MoG)-based modelling [ 53 ]. The principal
limitations of MoG are: (i) the size (M 2 ) of each covariance matrix becomes
extremely large when the dimension (M) of the problem increases; and (ii) each
component is a Gaussian, which is a condition that is rarely found in real data sets.
The antecedents of ICAMM can be found in [ 54 ] where each Gaussian of the
mixture was replaced with a probabilistic principal component analysis (PPCA),
allowing the covariance matrix dimension to be reduced, preserving the repre-
sentation of the data. This PCA-based method was modified in [ 55 ] using varia-
tional Bayesian inference to infer the optimum number of analysers, obtaining the
so-called Mixture of Factor Analysers. Afterwards, a robust approach for PPCA
that exploits the adaptive distribution tails of the Student-t was proposed [ 56 , 57 ].
This last allows the performance of the method is not spoiled by non-gaussian
noise (e.g., outliers). Thus, ICA mixture modelling has been the natural evolution
from these antecedents.
ICAMM was introduced in [ 58 ] considering a source model switching between
Laplacian and bimodal densities. Afterwards, the model was extended using
generalized exponential sources [ 59 ], self-similar areas such as mixtures of
Gaussians sub-features using variational Bayesian inference [ 53 ], and sources with
Search WWH ::




Custom Search