Hierarchical Clustering from ICA Mixtures - On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling

Information Technology Reference

In-Depth Information

A proximity or similarity measure is the basis for most clustering algorithms.

This measure between clusters at one level in the hierarchy (also referred to as

distance) is used to determine which of them will be merged. The distance between

two clusters can be estimated between pairs of data objects of each of the clusters

or between probabilistic relationships of the data densities of the two clusters. The

following general recurrence formula for estimating a function distance D ; ðÞ

was proposed in [ 4 ]:

¼ a i DC l ; C i

þ b i DC i ; C j

DC l C i ; C j

Þ a j DC l ; C j

þ c DC l ; C i

Þ DC l ; C j

ð 4 : 1 Þ

Equation ( 4.1 ) describes the distance between a cluster l and a new cluster formed

by the merging of two clusters i and j : By manipulating the coefficients a i ; a j ; b ;

and c ; several hierarchical algorithms of clustering based on distances between

data objects can be derived. Note that if a i ¼ a j ¼ 1 = 2 ; b ¼ 0 ; and c ¼ 1 = 2 ;

Eq. ( 4.1 )isDC l C l ; C j

which corresponds to the

single linkage method. In the case that a i ¼ a j ¼ c ¼ 1 = 2 and b ¼ 0 ; Eq. ( 4.1 )

becomes DC l C l ; C j

¼ min DC l ; C i

Þ; DC l ; C j

¼ max DC l ; C i

; which corresponds to the

Þ; DC l ; C j

complete linkage method [ 5 ].

The probabilistic approaches to hierarchical clustering consider model-based

criteria or Bayesian hypotheses to decide on merging clustering rather than using

an ad-hoc distance metric. Basically, there are two approaches to derive the

hierarchy: hierarchical generative modelling of the data or hierarchical ways of

organizing nested clusters. Methods of the first approach include the following

hierarchical generative models, for instance: Gaussian- [ 6 ], diffusion- [ 7 ], and

mutation process-based [ 8 ]. The first two methods can be used for inference, and

the last one can be used for semi-supervised learning. In [ 9 ], an agglomerative

algorithm to merge of Gaussian mixtures is presented. It considers a virtual sample

generated from the model at a level and uses EM to find the expressions for the

mixture model parameters for the next level that best explain the virtual sample.

Methods of the second approach include: agglomerative model merging, which is

based on marginal likelihoods in the context of HMM [ 10 ]; a method to compute

the marginal likelihood for c and c - 1 clusters for use in an agglomerative

algorithm [ 11 ]; clustering of multinomial feature vector data considering subsets

of features having common distributions [ 12 ]; probabilistic abstraction hierarchies

in which each node contains a probabilistic model with the most similar models

being neighbouring nodes (estimated by a distance function) [ 13 ]; agglomerative

algorithm for merging time series based on greedily maximizing marginal likeli-

hood [ 14 , 15 ]; and using marginal likelihoods to decide which clusters to merge,

when to stop, and when to avoid overfitting by testing a Bayesian hypothesis [ 16 ].

The model of this last method can be used to compute the predictive distribution of

a test point and the probability of it belonging to any of the existing clusters in the

tree.

Work on clustering that is related with hierarchies that are derived from ICA

can be found in a hierarchical latent variable model for data visualization proposed

On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling

Search WWH ::

Custom Search

Home