Information Technology Reference
In-Depth Information
6.4.5 Experiments
The LEGClust algorithm was applied to a large variety of artificial and real-
world datasets, some of them with a large number of features. We now present
the results of some of these experiments, which are described in [204] and
involve the real-world datasets summarized in Table 6.20. Dataset NCI Mi-
croarray can be found in [163], 20NewsGroups, Dutch Handwritten Numerals
(DHN), Iris, Wdbc and Wine in [13] and Olive in [75].
Tabl e 6 . 20
Real datasets used in the experiments.
Dataset
# Objects # Features # Classes
20NewsGroups
1000
565
20
DHN
2000
3
10
Iris
150
4
3
NCI Microarray
64
6830
12
Olive
572
8
9
Wdbc
569
30
2
Wine
178
13
3
The artificial two-dimensional datasets are taken from [201] and can be
found in [197]. These datasets are used to better visualize and control the
clustering process. Some examples are depicted in Fig. 6.31.
For the artificial dataset problems the clustering solutions yielded by dif-
ferent algorithms were compared with the majority choice solutions obtained
in the human clustering experiments mentioned in Sect. 6.4.4 and described
in [201]. For the real-world datasets the comparison was made with the su-
pervised classes. In both cases — majority choice or supervised classes — we
will refer to these solutions as reference solutions or reference clusters.
The LEGClust solutions were compared with those of the following well-
known clustering algorithms: Chameleon algorithm, included in the software
package Cluto [126], two Spectral clustering algorithms (Spectral-Ng [166]
and Spectral-Shi [209]) and one density-based algorithm, DBScan [248].
Regarding the experiments with the artificial datasets, shown in Fig. 6.31,
Fig. 6.32 presents the solutions obtained with LEGClust.
In Fig. 6.33 we present the solutions obtained with the Chameleon algo-
rithm that differ from those suggested by LEGClust.
From the performed experiments, an important aspect noticed when us-
ing the Chameleon algorithm was the different solutions obtained for slightly
different parameter values. Dataset 6.33c was reported in [204] as being the
one that presented more diculties in tuning the parameters involved in
Chameleon algorithm. Such tuning diculties don't arise when using LEG-
Clust, since as we said before LEGClust is not sensitive to small changes of
its parameters. A particular difference between the Chameleon and the LEG-
Clust corresponds to the curious solution given by Chameleon Fig. 6.33b.
 
Search WWH ::




Custom Search