Databases Reference
In-Depth Information
maximal
-bicluster of the largest size is computationally costly. Therefore, we can use
a heuristic greedy search method to obtain a local optimal cluster. The algorithm works
in two phases.
In the deletion phase , we start from the whole matrix. While the mean-squared
residue of the matrix is over
, we iteratively remove rows and columns. At each
iteration, for each row i , we compute the mean-squared residue as
X
j 2 J .
1
j J j
2 .
d
.
i
/D
e ij e iJ e Ij C e IJ /
(11.21)
Moreover, for each column j , we compute the mean-squared residue as
X
i 2 I .
1
j I j
2 .
d
.
j
/D
e ij e iJ e Ij C e IJ /
(11.22)
We remove the row or column of the largest mean-squared residue. At the end of this
phase, we obtain a submatrix I J that is a
-bicluster. However, the submatrix may
not be maximal.
In the addition phase , we iteratively expand the
-bicluster I J obtained in the dele-
tion phase as long as the
-bicluster requirement is maintained. At each iteration, we
consider rows and columns that are not involved in the current bicluster I J by cal-
culating their mean-squared residues. A row or column of the smallest mean-squared
residue is added into the current
-bicluster.
-bicluster only. To find multiple biclusters that
do not have heavy overlaps, we can run the algorithm multiple times. After each execu-
tion where a
This greedy algorithm can find one
-bicluster is output, we can replace the elements in the output bicluster
by random numbers. Although the greedy algorithm may find neither the optimal
biclusters nor all biclusters, it is very fast even on large matrices.
EnumeratingAllBiclustersUsingMaPle
As mentioned, a submatrix I J is a bicluster with coherent values if and only if for any
i 1 , i 2 2 I and j 1 , j 2 2 J , e i 1 j 1 e i 2 j 1 D e i 1 j 2 e i 2 j 2 . For any 22 submatrix of I J , we can
define a p-score as
e i 1 j 1
!
e i 1 j 2
p -score
Dj.
e i 1 j 1 e i 2 j 1 /.
e i 1 j 2 e i 2 j 2 /j.
(11.23)
e i 2 j 1
e i 2 j 2
-pCluster (for p attern-based cluster ) if the p -score of every
22 submatrix of I J is at most
A submatrix I J is a
0 is a threshold specifying a user's
tolerance of noise against a perfect bicluster. Here, the p -score controls the noise on
every element in a bicluster, while the mean-squared residue captures the average noise.
An interesting property of
, where
-pCluster is that if I J is a
-pCluster, then every
x y
.
x , y 2
/
submatrix of I J is also a
-pCluster. This monotonicity enables
 
Search WWH ::




Custom Search