Learning Mixtures of Independent Component Analysers - On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling

Information Technology Reference

In-Depth Information

likelihood providing a good compromise between convergence velocity and

computational payload. The final results of convergence were consistent for all the

supervision ratios and the log-likelihood results were ordered by sr. In addition, we

verified that the distances (measured by MSE) between the estimated centroids

b k k ¼ 1...3 and the original centroids of the ICA mixtures decrease with higher

supervision.

The above results demonstrate that the perturbation introduced by r k ð i Þ k ¼ 1...K

due to unlabelled data affects the convergence properties in the learning of the class

parameters. This residual increment affects the cases with the lowest supervisions the

most. For the highest supervisions, the convergence depends on the algorithm used to

update the ICA parameters of the classes, as discussed in Sect. 3.3.5 .

The classification and BSS results for sr 3 achieved the correct solution, and

the results for sr\3 were close to the correct solution. The maximum difference

for different sr was the difference between the unsupervised case and the super-

vised case (0.176 log-likelihood, 8.6 dB SIR, 28.3 % classification accuracy).

These parameters can be relatively critical depending on the specific application,

and they underscore the importance of incorporating semi-supervised learning in

ICAMM in order to take advantage of a partial labelling of the data (see Sect.

3.4.5 ). In addition, we repeated this experiment, but we changed the embedded

ICA algorithm using standard algorithms such as JADE and FastIca for parameter

updating. In general, the results of SIR, and classification accuracy were comparable

for all the embedded algorithms. The efficiency of JADE and FastIca in separation of

super-gaussian sources is known; for this kind of sources, the kernel density

estimation obtained similar results. However, for the log-likelihood results, the non-

parametric Mixca converged to highest values in a range of sr (0.3-1) while Mixca-

JADE and Mixca-FastIca only converged to the highest values of log-likelihood for

the supervised case. Thus, for these latter algorithms, more cases of middle-

log-likelihood were obtained.

In the second experiment we measured the approximate number of observation

vectors required by the Mixca procedure to achieve particular mean SIRs. A total

of 400 Monte Carlo simulations were generated with the following parameters: (i)

Number of classes in the ICA mixture K ¼ 2; (ii) Number of observation vectors

per class N ¼ 100 ; 200, 300, 400, 500; (iii) Number of sources = 4 (Laplacians

with a sharp peak at the bias and heavy tails); (iv) Supervision ratio (sr) = 0, 0.1,

0.3, 0.5, 0.7, 1; (v) Embedded ICA algorithm = Non-parametric Mixca.

Figure 3.6 a, b show the detailed and mean results of the second experiment.

Different graphs of the SIR obtained for different numbers of observation vectors

that correspond to the supervision ratios used in the training stage are depicted in

Fig. 3.6 b. The number of observation vectors required to obtain a particular SIR

value increased with less supervision. In general, the results demonstrate that the

non-parametric Mixca procedure is able to achieve a good quality of SIR with a

small number of observation vectors, e.g., 20 dB of SIR were obtained with only

203, 215, 231, 258, 306 observation vectors for sr = 1, 0.7, 0.5, 0.3, 0.1,

respectively. The results of this experiment confirm that the convergence effi-

ciency of the proposed procedure increases significantly when only a small

On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling

Search WWH ::

Custom Search

Home