Learning Mixtures of Independent Component Analysers - On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling

Information Technology Reference

In-Depth Information

enhancements in ICAMM such as semi-supervision, correction of residual

dependencies, embedding of any ICA algorithm in the learning process, and non-

parametric estimation of the source densities. Let us discuss on some of the fea-

tures of the proposed algorithm.

We have selected a kernel non-parametric estimator to estimate the pdf (see Eq.

( 3.7 )). The kernel estimator has a closed form expression so that subsequent

analytical development is possible, as we have done in [ 5 ]. This is a well-studied

and simple method which has the advantage that the estimated density can be

easily guaranteed to be a true density, i.e., it is nonnegative and integrates to 1 [ 8 ].

Obviously, other alternatives having closed forms [ 9 ] would be readily applicable

in ICAMM by simply changing the corresponding expressions p s ð n Þ

;

k ðÞ

dW k

d log p s ð n Þ

;

and d log p s ð n Þ

k ðÞ

dW k : A comparison among different pdf estimators and their influence in

ICAMM is outside the scope of this work. Actually, there are not yet many works

devoted to comparing how different pdf non-parametric estimators influence the

performance of ICA.

We have selected a gradient algorithm for optimization to iteratively search the

maximum likelihood solution. Gradient algorithms are simple to implement and

have reasonably good convergence properties, particularly in combination with ad

hoc techniques to avoid blocking in local minima. To this end, we used an

annealing method in the implementation of the algorithm. The stepsize or learning

rate was annealing during the adaptation process in order to provide faster and

proper convergence. In addition, the learning rule of Eq. ( 3.8c ) was used in the

algorithm implementation in order to take advantage of the efficiency in learning

of the natural gradient technique. The natural gradient is based on differential

geometry and employs knowledge of the Riemannian structure of the parameter

space to adjust the gradient search direction. Furthermore, natural gradient is

asymptotically Fisher-efficient for maximum likelihood estimation [ 7 ].

Alternatives to gradient algorithms are possible in ICAMM, but we think it is

more interesting to understand to what extent the different convergence analyses

and experiments previously considered in ICA [ 10 , 11 ] generalize to ICAMM.

Note that, as indicated in step 3 of the iterative algorithm in Sect. 3.3.3 , the

updating increment of W k in every iteration is a weighted sum of the separate

increments due to every training sample vector. The corresponding weights are the

computed probability of the training vector belonging to class k. We can write the

increment in every iteration in the form

DW k ð i Þ¼ X

ð i Þþ X

ð i Þ

D ð n Þ

ICA W k ð i Þ pC k = x ð n Þ ; W

ð i Þ

þ X

ð i Þ

¼ X

ICA W k ð i Þþ X

D ð m Þ

D ð n Þ

ICA W k ð i Þ 1 pC k = x ð m Þ ; W

D ð l Þ

ICA W k ð i Þ pC k = x ð l Þ ; W

¼ X

D ð m Þ

ICA W k ð i Þ

þ r k ð i Þ

ð 3 : 13 Þ

On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling

Search WWH ::

Custom Search

Home