Information Technology Reference
In-Depth Information
where
μ
k
is the main effect of bicluster
k
, and
α
ik
and
β
jk
are the effects of sample
i
and feature
j
, respectively, in bicluster
k
,
ε
ijk
is the noise term for bicluster
k
, and
e
ij
models the data points that do not belong to any bicluster. Here
δ
ik
,
κ
jk
are binary
variables:
δ
ik
=
1 indicates that row
i
belongs to bicluster
k
, and
δ
ik
=
0 otherwise;
similarly,
0 otherwise.
In plain model [50], the entry
a
ij
has similar assumption with less factors to be
considered.
In nonoverlapping feature biclustering,
κ
jk
=
1 indicates that column
j
is in cluster
k
, and
κ
jk
=
K
k
κ
jk
≤
1, and in nonoverlapping sam-
∑
=
1
K
k
ple biclustering,
δ
jk
≤
1. Here, nonoverlapping sample is discussed. The priors
∑
=
1
of the indicators
are set so that a feature can be in multiple biclusters while
sample is at more than one.
In this model, an observation
a
ij
can belong to either one or none of the biclusters,
and the probability distribution of
a
ij
conditional on the bicluster indicators can be
rewritten as
κ
and
δ
2
ε
k
a
ij
|
δ
ik
=
1
,
κ
jk
=
1
∼
N
(
μ
k
+
α
ik
+
β
jk
,
σ
)
if
a
ij
belongs to bicluster
k
; otherwise,
2
e
a
ij
|
δ
ik
κ
jk
=
0 for all
k
∼
N
(
0
,
σ
)
.
With Gaussian zero-mean priors on the effect parameters, the marginal distribu-
tion of the
a
ij
conditional on the indicators is
B|
δ
,
κ
∼
N
(
0
,
Σ
)
,
T
where
Σ
is the covariance of matrix of
B
and
B
=
{
B
0
,
B
1
,
B
2
, ··· ,
B
K
}
with
B
k
=
{
1 and
B
0
being the vector of data points belonging to no
bicluster. More specifically,
a
ij
:
δ
ik
κ
jk
=
1
},
k
≥
Σ
is a sparse matrix of the form
⎛
⎞
e
I
0
σ
···
0
⎝
⎠
,
0
Σ
1
···
0
Σ
=
.
.
.
.
.
.
00
···
Σ
K
where
Σ
k
=
Cov
(
B
k
,
B
k
)
is the covariance matrix of all data points belonging to
cluster
k
.
To make inference form above BBC model, the implemented Gibbs sampling
method is used. Initializing from a set of randomly assigned values of
δ
's and
κ
's,
the column indicators
κ
are sampled by calculating the log-probability ratio
2
μ
2
α
2
2
ε
e
P
(
V
2
|
κ
jk
=
1
,
σ
,
σ
,
σ
β
k
,
σ
,
σ
)
P
(
κ
jk
=
1
)
k
k
k
log
)
,
2
2
2
β
2
P
(
V
2
|
κ
jk
=
0
,
σ
μ
k
,
σ
α
k
,
σ
k
,
σ
ε
k
,
σ
e
)
P
(
κ
jk
=
0
where
V
1
=
{
a
il
:
δ
ik
=
0or
κ
lk
=
0
,
l
=
j
}
, the set contains data points not in cluster
k
, and
V
2
=
{
, the set contains data points
that are or can in bicluster
k
. This notation follows that in [26].
a
il
:
δ
ik
=
1
,
κ
lk
=
1
,
l
=
j
}∪{
a
ij
:
δ
ik
=
1
}