Information Technology Reference
In-Depth Information
cut
(
H 1 ,
H 2 )
cut
(
H 2 ,
H 1 )
+
,
d p 1
d p 2
where d p 1 = i ( U 1 V 1 ) d i ,
d p 2 = j ( U 2 V 2 ) d j .
Define the indicator vector as
(
n 2 +
m 2 ) / ((
n 1 +
m 1 )(
n
+
m
)) ,
i
U 1
V 1 ,
(
y i =
n 1 +
m 1 ) / ((
n 2 +
m 2 )(
n
+
m
)) ,
i
U 2
V 2 ,
|
| =
, |
| =
, |
| =
, |
| =
where
U 1
n 1
U 2
n 2
V 1
m 1
V 2
m 2 , the objective of minimizing the
ratio cut of partition H 1
=(
U 1
,
V 1
,
E 1
) ,
H 2
=(
U 2
,
V 2
,
E 2
)
can be expressed as
y T Ly
min
,
y T y
y T e
s
.
t
.
=
1
,
=
0
.
Relax y to any real number, the solution is the eigenvector corresponding to the
second smallest eigenvalue of L [13, 17]. Thus, after obtaining the indicator for each
vertex of U
V , the corresponding subgraphs can be easily transformed back into
biclusters. Similarly, for normalized cut, define the indicator vector as
,
d U 2 V 2 / (
d U 1 V 1 d
) ,
i
U 1
V 1 ,
d U 1 V 1 / (
y i =
d U 2 V 2 d
) ,
i
U 2
V 2 ,
where d U 1 V 1 = i U 1 V 1 d i
d U 2 V 2 = j U 2 V 2 d j , the objective of minimizing the
normalized cut of partition H 1 =(
,
U 1 ,
V 1 ,
E 1 ) ,
H 2 =(
U 2 ,
V 2 ,
E 2 )
can be expressed as
y T Ly
min
,
(6.13)
y T Dy
y T De
s
.
t
.
=
1
,
=
0
.
(6.14)
Now the solution of this programming is the eigenvector corresponding to the
generalized eigenvalue problem Ly
Dy [51]. The above programming problems
can be also modeled to mixed integer programming.
For large data matrix A , the solution of its eigenvector problem is very difficult
and a method proposed by [17]. For more details of spectral biclustering, see [22]. In
above, only two biclusters are obtained instead of K ones. For K biclusters, Dhillon
[17] used k -means algorithm [31, 57] after obtaining the indicator vector y , and
another direct approach is from [23] by defining an indicator matrix.
SAMBA. Tanay et al. [54] presented a statistical algorithmic method for biclus-
ter analysis (SAMBA) based on bipartite graph and probabilistic modeling. Under
a bipartite graph model, the weight of each edge is assigned according to a prob-
abilistic model, thus, to find biclusters of A become to find heavy subgraphs of G
with high likelihood. This method is motivated by finding the complete bipartite
subgraph(biclique) of G . The idea of SAMBA has three steps: forming the bipartite
graph and calculating weights of edges and nonedges (two models introduced in
this step: a simple model and a refined model); applying a hashing technique to find
= λ
 
Search WWH ::




Custom Search