Information Technology Reference
In-Depth Information
denoting the expression level of feature
j
in sample
i
. Mostly, the matrix
A
is the
required input of an algorithm, but some algorithms also use the space of samples
or features.
Let
is also
called the feature vector of sample
i
. Similarly, for the features, it is denoted by
S
=
{
S
1
,
S
2
, ···,
S
n
}
be the sample set, where
S
i
=(
a
i
1
,
a
i
2
, ···,
a
im
)
F
=
T
, a column vector. Thus,
{
F
1
,
F
2
, ···,
F
m
}
with each vector
F
j
=(
a
1
j
,
a
2
j
, ···,
a
nj
)
T
the matrix
A
.
A bicluster is a submatrix of data matrix. It is denoted by
B
k
=(
S
k
,F
k
)
=(
S
1
,
S
2
, ···,
S
n
)
=(
F
1
,
F
2
, ···,
F
m
)
sat-
isfying that
and the entry denotes intersection entry with corre-
sponding row (sample) and column in both
A
and
B
k
. Assume that there are
K
biclusters founded in data matrix
A
; the set of biclusters is denoted by
S
k
⊆S
,
F
k
⊆F
B
=
{
B
k
:
k
=
1
,
2
, ··· ,
K
}
. Sometimes, we use
(
S
k
,F
)
to denote a cluster of rows (sam-
ples) and use
a cluster of columns (features). In some algorithms, the
number of row clusters is not equal to that of column clusters. Let
K
(
S,F
k
)
K
denote
the number of row clusters, column clusters, respectively, the set of biclusters is
B
,
k
=
K
}
=
{
(
S
k
,F
k
)
:
k
=
1
, ···,
K
,
1
, ···,
. Without explanation, we assume that
K
.
Additionally,
K
=
|S
k
|
denotes the cardinality of itself, i.e., the number of samples
in bicluster
B
k
=(
S
k
,F
k
)
|F
k
|
while for
, similarly, the number of features. Clearly,
|S|
=
n
, |F|
=
m
. In the following, the notation
i
∈S
k
(
j
∈F
k
) is short for
S
i
∈S
k
(
F
j
∈F
k
) without misleading.
Given a data matrix
A
, the biclustering problem is to design algorithms to find
biclusters
B
=
{
B
k
:
k
=
1
,
2
, ···,
K
}
of it, i.e., a subset of matrices of
A
such that
samples (rows,
S
k
) of each bicluster
B
k
exhibit some similar behavior under the
corresponding features (columns,
F
k
). From this point, a bicluster problem now is
transformed into a mathematical problem satisfying some requirements (which will
be defined in the following under different bicluster types and structures). Usually,
after finding biclusters in a data matrix, the rows and columns are rearranged so that
the samples/features in a same bicluster will be together, the resulted matrix is called
a proper rearrangement matrix. In the following discussions of bicluster types and
biclustering structures, the requirements are all based on the rearrangement of data
matrix.
6.2.2 Bicluster Types
The types of a bicluster is defined to be the relationships of entries within a bicluster.
As mentioned in Section 1.4, Madeira and Oliveira [37] have identified bicluster
types into following four major classes and here we follow their classification and
give the mathematical representations. For first three cases, the data matrix
A
is
required that
A
R
2
, i.e., all entries in
A
are real numbers.
∈
1. Bicluster with constant values. For a bicluster
B
k
=(
S
k
,F
k
)
, the following iden-
tity should be satisfied: