Information Technology Reference
In-Depth Information
A proximity or similarity measure is the basis for most clustering algorithms.
This measure between clusters at one level in the hierarchy (also referred to as
distance) is used to determine which of them will be merged. The distance between
two clusters can be estimated between pairs of data objects of each of the clusters
or between probabilistic relationships of the data densities of the two clusters. The
following general recurrence formula for estimating a function distance D ; ðÞ
was proposed in [ 4 ]:
¼ a i DC l ; C i
þ b i DC i ; C j
DC l C i ; C j
ð
Þ a j DC l ; C j
þ c DC l ; C i
ð
Þ DC l ; C j
ð 4 : 1 Þ
Equation ( 4.1 ) describes the distance between a cluster l and a new cluster formed
by the merging of two clusters i and j : By manipulating the coefficients a i ; a j ; b ;
and c ; several hierarchical algorithms of clustering based on distances between
data objects can be derived. Note that if a i ¼ a j ¼ 1 = 2 ; b ¼ 0 ; and c ¼ 1 = 2 ;
Eq. ( 4.1 )isDC l C l ; C j
which corresponds to the
single linkage method. In the case that a i ¼ a j ¼ c ¼ 1 = 2 and b ¼ 0 ; Eq. ( 4.1 )
becomes DC l C l ; C j
¼ min DC l ; C i
ð
Þ; DC l ; C j
¼ max DC l ; C i
; which corresponds to the
ð
Þ; DC l ; C j
complete linkage method [ 5 ].
The probabilistic approaches to hierarchical clustering consider model-based
criteria or Bayesian hypotheses to decide on merging clustering rather than using
an ad-hoc distance metric. Basically, there are two approaches to derive the
hierarchy: hierarchical generative modelling of the data or hierarchical ways of
organizing nested clusters. Methods of the first approach include the following
hierarchical generative models, for instance: Gaussian- [ 6 ], diffusion- [ 7 ], and
mutation process-based [ 8 ]. The first two methods can be used for inference, and
the last one can be used for semi-supervised learning. In [ 9 ], an agglomerative
algorithm to merge of Gaussian mixtures is presented. It considers a virtual sample
generated from the model at a level and uses EM to find the expressions for the
mixture model parameters for the next level that best explain the virtual sample.
Methods of the second approach include: agglomerative model merging, which is
based on marginal likelihoods in the context of HMM [ 10 ]; a method to compute
the marginal likelihood for c and c - 1 clusters for use in an agglomerative
algorithm [ 11 ]; clustering of multinomial feature vector data considering subsets
of features having common distributions [ 12 ]; probabilistic abstraction hierarchies
in which each node contains a probabilistic model with the most similar models
being neighbouring nodes (estimated by a distance function) [ 13 ]; agglomerative
algorithm for merging time series based on greedily maximizing marginal likeli-
hood [ 14 , 15 ]; and using marginal likelihoods to decide which clusters to merge,
when to stop, and when to avoid overfitting by testing a Bayesian hypothesis [ 16 ].
The model of this last method can be used to compute the predictive distribution of
a test point and the probability of it belonging to any of the existing clusters in the
tree.
Work on clustering that is related with hierarchies that are derived from ICA
can be found in a hierarchical latent variable model for data visualization proposed
Search WWH ::




Custom Search