Information Technology Reference
In-Depth Information
DBScan [69] and Mean Shift [77, 40, 44, 45] algorithms. Mean Shift has some
very good results in image segmentation and computer vision applications.
In a later section we will present experimental results comparing clustering
algorithms of the above types with LEGClust.
6.4.2.1
Clustering with Entropy
Clustering algorithms applying concepts of entropy, mutual information and
the Kullback-Leibler divergence, have been proposed by several authors. Ex-
amples of such algorithms are the minimum entropic clustering [137], entropic
spanning graphs clustering [98] and entropic subspace clustering [39]. In other
works entropy is used as a measure of proximity or interrelation between clus-
ters. Examples of these algorithms are those proposed by Jenssen [117] and
Gokcay [84], that use a so-called Between-Cluster Entropy, and the one pro-
posed by Lee [134,135] that uses the Within-Cluster Association. A common
characteristic of these algorithms is that they start by selecting random seeds
for the first clusters which may produce very different results in the final clus-
tering solution. Recent works by He [97], Vretos [234] and Faivishevsky [71]
are examples of clustering algorithms using mutual information.
The main characteristics of these entropy-based algorithms are their high
complexity and that they are heavily time consuming. They are, nonetheless,
attractive because they present some good results in several specific applica-
tions.
The LEGClust algorithm described in the following sections uses Renyi's
quadratic entropy, H R 2 , which was seen in the previous chapters to be com-
putationally simpler to use than H S or H R α
=2. One could however use
other entropic measures as well.
Renyi's quadratic entropy was already discussed earlier in this topic. LEG-
Clust uses the multivariate version of formula (F.10)
N
N
1
N 2
H R 2 =
x j ; 0 , 2 h 2 I )
,
ln
g (x i
(6.49)
i =1
j =1
where x i and x j are the data vectors.
6.4.3 Dissimilarity Matrix
As stated earlier, clustering algorithms are based on similarity or dissimilarity
measures between the objects (data instances) of a set.
Objects belonging to the same partition or cluster possess a higher degree
of similarity with each other than with any other objects from other clusters.
 
Search WWH ::




Custom Search