Biology Reference
In-Depth Information
In addition, if it is not certain that the initial w ij values form an optimal solu-
tion, such as the case of randomly generated w ij values, it is then not included
in subsequent integer cuts. It is important to note that while the GOS algorithm
tends to give good optimal solutions, it does not have a theoretical guarantee of
returning a globally optimum solution. Hence the issue of providing the algo-
rithm with a quality initialization point is important and will be addressed later in
the paper. Note also that in a typical GBD algorithm, there may be an infeasible
primal problem, for which the problem statement would have to be reformulated
accordingly. In this case, since there is only one set of continuous variable (i.e.,
z jk ) to be solved in the primal problem, and there is always a feasible assignment
of points to clusters, leading to the calculation of the cluster centers, all primal
problems are feasible.
16.2.2.4. Determining the Optimal Number of Clusters
Most clustering algorithms do not contain screening functions to determine the
optimal number of clusters. Yet this is important to evaluate the results of clus-
ter analysis in a quantitative and objective fashion. On the other hand, while it
is relatively easy to propose indices of cluster validity, it is difficult to incorpo-
rate these measures into clustering algorithms and appoint thresholds on which
to define key decision values [27, 36]. Some of the indices used to compute
cluster validity include the Dunn's validity index [16], the Davis-Bouldin validity
index [12], the Silhouette validation technique, the C index [33], the Goodman-
Kruskal index [25], the Isolation index [52], the Jaccard index [35], and the Rand
index [54]. We note that the optimal number of clusters occurs when the inter-
cluster distance is maximized and the intra-cluster distance is minimized. We
adapt the concept of a clustering balance [39], where it has been shown to have a
minimum value when intra-cluster similarity is maximized and inter-cluster simi-
larity is minimized. This provides a measure of how optimal is a certain number
of clusters used for a particular clustering algorithm. We introduce the following:
Global Center, z k = n i =1 a ik ,
k
(16.5)
Intra-Cluster Error Sum, Λ= i =1 c j =1 s k =1 w ij
2
a ik
z jk
(16.6)
Inter-Cluster Error Sum, Γ= c j =1 s k =1
z jk
2
z jk
(16.7)
From here, [39] proposed a clustering balance parameter, which is the α -
weighted sum of the two error sums.
Clustering Balance, ε = α Λ+(1
α
(16.8)
Search WWH ::




Custom Search