Recent Advances of Data Biclustering with Application in Computational Neuroscience - Computational Neuroscience

Information Technology Reference

In-Depth Information

cut

(

H 1 ,

H 2 )

cut

(

H 2 ,

H 1 )

d p 1

d p 2

where d p 1 = ∑ i ∈ ( U 1 ∪ V 1 ) d i ,

d p 2 = ∑ j ∈ ( U 2 ∪ V 2 ) d j .

Define the indicator vector as

(

n 2 +

m 2 ) / ((

n 1 +

m 1 )(

)) ,

∈

U 1 ∪

V 1 ,

− (

y i =

n 1 +

m 1 ) / ((

n 2 +

m 2 )(

)) ,

∈

U 2 ∪

V 2 ,

| =

, |

| =

, |

| =

, |

| =

where

U 1

n 1

U 2

n 2

V 1

m 1

V 2

m 2 , the objective of minimizing the

ratio cut of partition H 1

U 1

V 1

E 1

) ,

H 2

U 2

V 2

E 2

)

can be expressed as

y T Ly

min

y T y

y T e

Relax y to any real number, the solution is the eigenvector corresponding to the

second smallest eigenvalue of L [13, 17]. Thus, after obtaining the indicator for each

vertex of U

V , the corresponding subgraphs can be easily transformed back into

biclusters. Similarly, for normalized cut, define the indicator vector as

d U 2 ∪ V 2 / (

d U 1 ∪ V 1 d

) ,

∈

U 1 ∪

V 1 ,

− d U 1 ∪ V 1 / (

y i =

d U 2 ∪ V 2 d

) ,

∈

U 2 ∪

V 2 ,

where d U 1 ∪ V 1 = ∑ i ∈ U 1 ∪ V 1 d i

d U 2 ∪ V 2 = ∑ j ∈ U 2 ∪ V 2 d j , the objective of minimizing the

normalized cut of partition H 1 =(

U 1 ,

V 1 ,

E 1 ) ,

H 2 =(

U 2 ,

V 2 ,

E 2 )

can be expressed as

y T Ly

min

(6.13)

y T Dy

y T De

(6.14)

Now the solution of this programming is the eigenvector corresponding to the

generalized eigenvalue problem Ly

Dy [51]. The above programming problems

can be also modeled to mixed integer programming.

For large data matrix A , the solution of its eigenvector problem is very difficult

and a method proposed by [17]. For more details of spectral biclustering, see [22]. In

above, only two biclusters are obtained instead of K ones. For K biclusters, Dhillon

[17] used k -means algorithm [31, 57] after obtaining the indicator vector y , and

another direct approach is from [23] by defining an indicator matrix.

SAMBA. Tanay et al. [54] presented a statistical algorithmic method for biclus-

ter analysis (SAMBA) based on bipartite graph and probabilistic modeling. Under

a bipartite graph model, the weight of each edge is assigned according to a prob-

abilistic model, thus, to find biclusters of A become to find heavy subgraphs of G

with high likelihood. This method is motivated by finding the complete bipartite

subgraph(biclique) of G . The idea of SAMBA has three steps: forming the bipartite

graph and calculating weights of edges and nonedges (two models introduced in

this step: a simple model and a refined model); applying a hashing technique to find

= λ

Computational Neuroscience

Search WWH ::

Custom Search

Home