A Novel Clustering Approach: Global Optimum Search with Enhanced Positioning - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

In addition, if it is not certain that the initial w ij values form an optimal solu-

tion, such as the case of randomly generated w ij values, it is then not included

in subsequent integer cuts. It is important to note that while the GOS algorithm

tends to give good optimal solutions, it does not have a theoretical guarantee of

returning a globally optimum solution. Hence the issue of providing the algo-

rithm with a quality initialization point is important and will be addressed later in

the paper. Note also that in a typical GBD algorithm, there may be an infeasible

primal problem, for which the problem statement would have to be reformulated

accordingly. In this case, since there is only one set of continuous variable (i.e.,

z jk ) to be solved in the primal problem, and there is always a feasible assignment

of points to clusters, leading to the calculation of the cluster centers, all primal

problems are feasible.

16.2.2.4. Determining the Optimal Number of Clusters

Most clustering algorithms do not contain screening functions to determine the

optimal number of clusters. Yet this is important to evaluate the results of clus-

ter analysis in a quantitative and objective fashion. On the other hand, while it

is relatively easy to propose indices of cluster validity, it is difficult to incorpo-

rate these measures into clustering algorithms and appoint thresholds on which

to define key decision values [27, 36]. Some of the indices used to compute

cluster validity include the Dunn's validity index [16], the Davis-Bouldin validity

index [12], the Silhouette validation technique, the C index [33], the Goodman-

Kruskal index [25], the Isolation index [52], the Jaccard index [35], and the Rand

index [54]. We note that the optimal number of clusters occurs when the inter-

cluster distance is maximized and the intra-cluster distance is minimized. We

adapt the concept of a clustering balance [39], where it has been shown to have a

minimum value when intra-cluster similarity is maximized and inter-cluster simi-

larity is minimized. This provides a measure of how optimal is a certain number

of clusters used for a particular clustering algorithm. We introduce the following:

Global Center, z k = n i =1 a ik ,

∀

k

(16.5)

Intra-Cluster Error Sum, Λ= i =1 c j =1 s k =1 w ij

2

a ik −

z jk

(16.6)

Inter-Cluster Error Sum, Γ= c j =1 s k =1

z jk

2

z jk −

(16.7)

From here, [39] proposed a clustering balance parameter, which is the α -

weighted sum of the two error sums.

Clustering Balance, ε = α Λ+(1

−

α )Γ

(16.8)

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home