Information Technology Reference
In-Depth Information
denoting the expression level of feature j in sample i . Mostly, the matrix A is the
required input of an algorithm, but some algorithms also use the space of samples
or features.
Let
is also
called the feature vector of sample i . Similarly, for the features, it is denoted by
S = {
S 1 ,
S 2 , ···,
S n }
be the sample set, where S i =(
a i 1 ,
a i 2 , ···,
a im )
F =
T , a column vector. Thus,
{
F 1 ,
F 2 , ···,
F m }
with each vector F j =(
a 1 j ,
a 2 j , ···,
a nj )
T
the matrix A
.
A bicluster is a submatrix of data matrix. It is denoted by B k =( S k ,F k )
=(
S 1 ,
S 2 , ···,
S n )
=(
F 1 ,
F 2 , ···,
F m )
sat-
isfying that
and the entry denotes intersection entry with corre-
sponding row (sample) and column in both A and B k . Assume that there are K
biclusters founded in data matrix A ; the set of biclusters is denoted by
S k ⊆S
,
F k ⊆F
B = {
B k :
k
=
1
,
2
, ··· ,
K
}
. Sometimes, we use
( S k ,F )
to denote a cluster of rows (sam-
ples) and use
a cluster of columns (features). In some algorithms, the
number of row clusters is not equal to that of column clusters. Let K
( S,F k )
K denote
the number of row clusters, column clusters, respectively, the set of biclusters is
B
,
k =
K }
= { ( S k ,F k )
: k
=
1
, ···,
K
,
1
, ···,
. Without explanation, we assume that
K .
Additionally,
K
=
|S k |
denotes the cardinality of itself, i.e., the number of samples
in bicluster B k =( S k ,F k )
|F k |
while for
, similarly, the number of features. Clearly,
|S| =
n
, |F| =
m . In the following, the notation i
∈S k ( j
∈F k ) is short for S i
∈S k
( F j ∈F k ) without misleading.
Given a data matrix A , the biclustering problem is to design algorithms to find
biclusters
B = {
B k : k
=
1
,
2
, ···,
K
}
of it, i.e., a subset of matrices of A such that
samples (rows,
S k ) of each bicluster B k exhibit some similar behavior under the
corresponding features (columns,
F k ). From this point, a bicluster problem now is
transformed into a mathematical problem satisfying some requirements (which will
be defined in the following under different bicluster types and structures). Usually,
after finding biclusters in a data matrix, the rows and columns are rearranged so that
the samples/features in a same bicluster will be together, the resulted matrix is called
a proper rearrangement matrix. In the following discussions of bicluster types and
biclustering structures, the requirements are all based on the rearrangement of data
matrix.
6.2.2 Bicluster Types
The types of a bicluster is defined to be the relationships of entries within a bicluster.
As mentioned in Section 1.4, Madeira and Oliveira [37] have identified bicluster
types into following four major classes and here we follow their classification and
give the mathematical representations. For first three cases, the data matrix A is
required that A
R 2 , i.e., all entries in A are real numbers.
1. Bicluster with constant values. For a bicluster B k =( S k ,F k )
, the following iden-
tity should be satisfied:
Search WWH ::




Custom Search