Information Technology Reference
In-Depth Information
(a) dataset13; nc =4; a =20; n =20 (b) dataset13; nc =3; a =20; n =20
(c) dataset34; nc =2; a =50; n =6
Fig. 6.33 Some clustering solutions suggested by Chameleon. The considered val-
ues nc , a and n are shown in each label.
We now report on the experiments with real-world datasets described in
[204].
The DHN dataset consists of 2000 images of handwritten numerals ('0'-
'9') extracted from a collection of Dutch utility maps [60]. A sample of this
dataset is depicted in Fig. 6.36. In this dataset, the first two features represent
the pixel position and the third one, the gray level. Experiments with this
dataset were performed with LEGClust and Spectral clustering.
Results are presented in Table 6.21. ARI stands for Adjusted Rand Index, a
measure for comparing results of different clustering solutions when the labels
are known [107]. This index is an improvement of the Rand Index [180], it lies
between 0 and 1 and the higher the ARI index the better the clustering solu-
tion. The parameters for both algorithms were tuned to give the best possible
solutions. In this problem, LEGClust performs far better than Spectral-Shi and
with similar (but slightly better) results than Spectral-Ng. Table 6.21, shows
different LEGClust results for different choices of the minimum number of con-
nections ( k ) to join clusters. These results clearly show that different values of
k produce results with small differences in the ARI value.
Search WWH ::




Custom Search