Biology Reference
In-Depth Information
but most of the genes will belong to multiple pre-clusters, and the number of
maximal cliques formed is small. On the other hand, an unnecessarily strict cut-
off results in a small number of pre-clusters, thus not accurately reflecting the
extent of relatedness between the data. In pre-clustering over a range of cut-off
values, we would then be able to select the optimum criterion as the point where
the maximum number of complete cliques is formed [64].
Iterative Clustering: We let the initial cluster set be defined by the unique
genes pre-clustered in the previous step and compute the cluster centers. We next
compute the distance between each of the remaining genes and these initial cen-
ters and as a good initialization point placed these genes into the nearest cluster
based on:
s
z initial
jk ) 2 ,
min[
( a ik
j ] ,
i
unique
k =1
We then create a rank-order list for each of the remaining genes for its dis-
tance to each of the initial clusters, and for each gene allow its suitability in 4
nearest clusters via its suit ij parameters. For this particular dataset, a separate
study (results not shown here) has indicated that the clustering results cease to
change significantly once the number of suit ij values for each gene exceed 4.
The initialization point and the suit ij parameter assignments are then utilized in
the primal problem of the GOS algorithm as described in Problem 7.1 to solve for
z jk . These, together with the Lagrange multipliers, are inputted into the master
problem (Problem 7.2) to solve for w ij . The primal problem gives an upper bound
solution and the master problem provides a lower bound. The optimal solution is
obtained when the lower and upper bounds converge. Then, the worst placed gene
based on:
c
s
z updated
jk
) 2 ,
max[
( a ik
i
unique ]
j =1
k =1
is removed and used as a seed for a new cluster with center z new . This gene has
already been subjected to the initial search for membership so there is no reason
for it to belong to any one of the older clusters. Based on z new and z updated
(updated without the worst-placed gene), the iterative steps are repeated, by se-
lecting a new initialization point, assignment suit ij parameters, and running the
GOS algorithm again. With these iterations, the number of clusters builds up from
the initial number defined by the pre-clustering work, until the optimal number of
clusters is attained. Our proposed clustering methodology can be summarized by
the schematic in Fig. 16.1.
Search WWH ::




Custom Search