Information Technology Reference
In-Depth Information
2
7
3
2
7
3
1
1
8
8
10
10
Cluster 1
4
6
9
6
5
9
4
5
Cluster 2
Cluster 3
12
12
13
11
13
11
14
14
15
Cluster 4
15
(a)
(b)
Fig. 6.30 First steps of a clustering process: a) first layer connections and resulting
elementary clusters; b) second layer connections.
The process is repeated and the algorithm stops when only one cluster is
present or when the same number of clusters in consecutive steps is obtained.
The resulting number of clusters for this simple example was 4-2-2-2-1, with
the same number of clusters (2) in steps 2, 3 and 4; therefore, one would
normally consider 2 as the acceptable number of clusters.
Several refinements of the algorithm are also proposed in [198], namely strate-
gies to avoid outliers, noise and micro clusters and different ways of choosing
the number of connections to join clusters.
6.4.4.4
Parameter Selection
Number of Nearest Neighbors
The first parameter one must choose in the clustering process is the number
of nearest neighbors ( M ). There is no specific rule for this choice. However,
since the maximum number of steps in the clustering process is related to
the number of nearest neighbors, one should not choose a very small value,
because a minimum number of steps is needed to guarantee reaching a solu-
tion. Choosing a relatively high value for M is also not a good alternative
because one loses information about the local structure, which is the main
focus of the algorithm.
Based on the experiments reported in [204] in several datasets, a rule of
thumb of using an M value not higher than 10% of the dataset size seems ap-
propriate. Note that, since entropy computation has complexity O N 2 2 ,
the value of M has a large influence on the computational time. Hence, for
large datasets a smaller M is recommended, down to 2% of the data size.
 
Search WWH ::




Custom Search