Information Technology Reference
In-Depth Information
cut
(
H
1
,
H
2
)
cut
(
H
2
,
H
1
)
+
,
d
p
1
d
p
2
where
d
p
1
=
∑
i
∈
(
U
1
∪
V
1
)
d
i
,
d
p
2
=
∑
j
∈
(
U
2
∪
V
2
)
d
j
.
Define the indicator vector as
(
n
2
+
m
2
)
/
((
n
1
+
m
1
)(
n
+
m
))
,
i
∈
U
1
∪
V
1
,
−
(
y
i
=
n
1
+
m
1
)
/
((
n
2
+
m
2
)(
n
+
m
))
,
i
∈
U
2
∪
V
2
,
|
|
=
, |
|
=
, |
|
=
, |
|
=
where
U
1
n
1
U
2
n
2
V
1
m
1
V
2
m
2
, the objective of minimizing the
ratio cut of partition
H
1
=(
U
1
,
V
1
,
E
1
)
,
H
2
=(
U
2
,
V
2
,
E
2
)
can be expressed as
y
T
Ly
min
,
y
T
y
y
T
e
s
.
t
.
=
1
,
=
0
.
Relax
y
to any real number, the solution is the eigenvector corresponding to the
second smallest eigenvalue of
L
[13, 17]. Thus, after obtaining the indicator for each
vertex of
U
V
, the corresponding subgraphs can be easily transformed back into
biclusters. Similarly, for normalized cut, define the indicator vector as
,
d
U
2
∪
V
2
/
(
d
U
1
∪
V
1
d
)
,
i
∈
U
1
∪
V
1
,
−
d
U
1
∪
V
1
/
(
y
i
=
d
U
2
∪
V
2
d
)
,
i
∈
U
2
∪
V
2
,
where
d
U
1
∪
V
1
=
∑
i
∈
U
1
∪
V
1
d
i
d
U
2
∪
V
2
=
∑
j
∈
U
2
∪
V
2
d
j
, the objective of minimizing the
normalized cut of partition
H
1
=(
,
U
1
,
V
1
,
E
1
)
,
H
2
=(
U
2
,
V
2
,
E
2
)
can be expressed as
y
T
Ly
min
,
(6.13)
y
T
Dy
y
T
De
s
.
t
.
=
1
,
=
0
.
(6.14)
Now the solution of this programming is the eigenvector corresponding to the
generalized eigenvalue problem
Ly
Dy
[51]. The above programming problems
can be also modeled to mixed integer programming.
For large data matrix
A
, the solution of its eigenvector problem is very difficult
and a method proposed by [17]. For more details of spectral biclustering, see [22]. In
above, only two biclusters are obtained instead of
K
ones. For
K
biclusters, Dhillon
[17] used
k
-means algorithm [31, 57] after obtaining the indicator vector
y
, and
another direct approach is from [23] by defining an indicator matrix.
SAMBA.
Tanay et al. [54] presented a statistical algorithmic method for biclus-
ter analysis (SAMBA) based on bipartite graph and probabilistic modeling. Under
a bipartite graph model, the weight of each edge is assigned according to a prob-
abilistic model, thus, to find biclusters of
A
become to find heavy subgraphs of
G
with high likelihood. This method is motivated by finding the complete bipartite
subgraph(biclique) of
G
. The idea of SAMBA has three steps: forming the bipartite
graph and calculating weights of edges and nonedges (two models introduced in
this step: a simple model and a refined model); applying a hashing technique to find
=
λ