Information Technology Reference
In-Depth Information
1
|S
k
|
∑
μ
(
c
)
jk
=
a
ij
,
(6.8)
i
∈
S
k
and the mean of all the entries in
B
k
is
μ
k
=
∑
∈
F
k
a
ij
|F
k
||S
k
|
∑
i
∈
S
k
j
.
(6.9)
The residue of the entry
a
ij
in bicluster
B
k
is
a
ij
−
μ
(
r
)
ik
−
μ
(
c
)
r
ij
=
jk
+
μ
k
,
(6.10)
the variance of bicluster
B
k
is
B
k
)=
∑
i
∈S
k
∑
2
Va r
(
j
∈F
k
(
a
ij
−
μ
k
)
,
(6.11)
and mean squared residue of the bicluster
B
k
is
H
k
=
∑
i
∈S
k
∑
j
∈F
k
r
ij
|F
k
||S
k
|
.
(6.12)
The first approach of biclustering by Hartigan [28] is known as
block clustering
,
with the objective function as
K
k
=
1
Va r
(
B
k
)=
K
k
=
1
∑
i
∈
S
k
∑
2
min Var
(
B
)=
j
∈
F
k
(
a
ij
−
μ
k
)
,
where the number of biclusters is a given number. For each bicluster, the variance
Va r
(
B
k
)
is 0 if it is constant.
CC.
Cheng and Church's Algorithm (CC) [11] defines a bicluster to be a sub-
matrix for which the mean squared residue score is below a user-defined threshold
δ
represents the minimum possible value. To find the largest
bicluster in
A
, they propose a two-phase strategy: removing rows and columns and
then adding the removed rows and columns with some rules. First, the row to be
removed is the one
, i.e.,
H
k
≤
δ
, where
δ
1
|F
k
|
∑
r
ij
,
arg max
i
j
∈
F
k
and column is
1
|S
k
|
j
∈
S
k
r
ij
.
Repeating these removing steps until the bicluster with
H
k
≤
δ
arg max
j
obtained. Then some
previously removed rows and columns can be added without violating the require-
ment of
H
k
≤
δ
. Yang et al. [58, 59] proposed an improved version of this algorithm
which allows missing data entry of
A
with a heuristic flexible overlapped clustering
(FLOC) algorithm.