Recent Advances of Data Biclustering with Application in Computational Neuroscience - Computational Neuroscience

Information Technology Reference

In-Depth Information

denoting the expression level of feature j in sample i . Mostly, the matrix A is the

required input of an algorithm, but some algorithms also use the space of samples

or features.

Let

is also

called the feature vector of sample i . Similarly, for the features, it is denoted by

S = {

S 1 ,

S 2 , ···,

S n }

be the sample set, where S i =(

a i 1 ,

a i 2 , ···,

a im )

F =

T , a column vector. Thus,

{

F 1 ,

F 2 , ···,

F m }

with each vector F j =(

a 1 j ,

a 2 j , ···,

a nj )

the matrix A

A bicluster is a submatrix of data matrix. It is denoted by B k =( S k ,F k )

S 1 ,

S 2 , ···,

S n )

F 1 ,

F 2 , ···,

F m )

sat-

isfying that

and the entry denotes intersection entry with corre-

sponding row (sample) and column in both A and B k . Assume that there are K

biclusters founded in data matrix A ; the set of biclusters is denoted by

S k ⊆S

F k ⊆F

B = {

B k :

, ··· ,

}

. Sometimes, we use

( S k ,F )

to denote a cluster of rows (sam-

ples) and use

a cluster of columns (features). In some algorithms, the

number of row clusters is not equal to that of column clusters. Let K

( S,F k )

K denote

the number of row clusters, column clusters, respectively, the set of biclusters is

k =

K }

= { ( S k ,F k )

: k

, ···,

. Without explanation, we assume that

K .

Additionally,

|S k |

denotes the cardinality of itself, i.e., the number of samples

in bicluster B k =( S k ,F k )

|F k |

while for

, similarly, the number of features. Clearly,

|S| =

, |F| =

m . In the following, the notation i

∈S k ( j

∈F k ) is short for S i

∈S k

( F j ∈F k ) without misleading.

Given a data matrix A , the biclustering problem is to design algorithms to find

biclusters

B = {

B k : k

, ···,

}

of it, i.e., a subset of matrices of A such that

samples (rows,

S k ) of each bicluster B k exhibit some similar behavior under the

corresponding features (columns,

F k ). From this point, a bicluster problem now is

transformed into a mathematical problem satisfying some requirements (which will

be defined in the following under different bicluster types and structures). Usually,

after finding biclusters in a data matrix, the rows and columns are rearranged so that

the samples/features in a same bicluster will be together, the resulted matrix is called

a proper rearrangement matrix. In the following discussions of bicluster types and

biclustering structures, the requirements are all based on the rearrangement of data

matrix.

6.2.2 Bicluster Types

The types of a bicluster is defined to be the relationships of entries within a bicluster.

As mentioned in Section 1.4, Madeira and Oliveira [37] have identified bicluster

types into following four major classes and here we follow their classification and

give the mathematical representations. For first three cases, the data matrix A is

required that A

R 2 , i.e., all entries in A are real numbers.

∈

1. Bicluster with constant values. For a bicluster B k =( S k ,F k )

, the following iden-

tity should be satisfied:

Computational Neuroscience

Search WWH ::

Custom Search

Home