Biology Reference
In-Depth Information
min i =1 c j =1 s k =1 w ij ( a ik
z jk ) 2
(Problem 1)
c j =1 w ij =1 ,
s.t.
i =1 ,.....,n
w ij are binary variables ,z jk are continuous variables
There are two sets of variables in the problem, w ij and z jk . While the bounds
of w ij are clearly 0 and 1, that of z jk is obtained by observing the range of a ik
values.
z jk = min [ a ik ] ,
k =1 ,.....,s
z jk = max [ a ik ] ,
k =1 ,.....,s
The pre-clustering work suggests that some of the genes need only be re-
stricted to some number of known clusters, since it can be determined (for instance
by distance and correlation metrics) that certain genes are exceedingly dissimilar
from some of the pre-clusters and thus have virtually zero probability of being
clustered there. This restriction can be described by introducing an additional bi-
nary parameter suit ij . A data point deemed to belong uniquely to just one cluster
will only have suit ij =1for only one value of j and zero for the others, whereas a
data point restricted to a few clusters will have suit ij =1for only those clusters.
This reduces the computational demands of the problem. The introduction of the
suit ij parameters also obviates the need for constraints that prevent the redundant
re-indexing of clusters.
Together with the first-order optimality condition (FOC) (i.e. the vector dis-
tance sum of all genes within a cluster to the cluster center should be intuitively
zero), the formulation becomes:
min i =1 s k =1 a ik i =1 c j =1 s k =1 ( suit ij ( a ik w ij z jk ) (Problem 2)
s.t.
( suit ij )( z jk i =1 w ij i =1 a ik w ij )=0 ,
j
k
c j =1 ( suit ij ) w ij =1 ,
i
i =1 ( suit ij ) w ij
1
n
c +1
w ij =0
1 ,
i,
j
z jk
z jk ,
z jk
j,
k
The first set of constraints are the FOC, the second demand that each gene can
belong to only one cluster, and the third state that there is at least one and no more
than (n-c+1) data points in a cluster. Note also that the i =1 s k =1 a ik term in
Search WWH ::




Custom Search