Recent Advances of Data Biclustering with Application in Computational Neuroscience - Computational Neuroscience

Information Technology Reference

In-Depth Information

heaviest bicliques(biclusters) in the graph; and performing a local improvement pro-

cedure on the biclusters in each heap.

Given a data matrix A , the corresponding bipartite graph is G

)

U ,

V ,

E )

bicluster corresponds to a subgraph H

as introduced above. The

weight of a subgraph is the sum of the assigned weights of edges

(

) ∈

and

E =(

E . The subgraph with assigned weights has its

statistical significance and finding a bicluster is to search heavy subgraph with re-

spect to the weight of subgraph. There are two models introduced in [54]: a simple

model and a refined model.

In the simple model, let

U ×

V ) \

nonedges

(

) ∈

mn and assume that edges occur indepen-

dently and equiprobability with density p .Let BT

| =

, binomial distribution, be

the probability of observing k or more success occurs independently with p ,the

probability of observing a graph at least as dense as H is p

(

)

k ,

n m )

(

where k ,

n ,

m are corresponding notations in H

U ,

V ,

E )

. Finding a maximum

weight subgraph of G is equivalent of finding a subgraph H with lowest p

(

)

.In

the refined model, each edge

is an independent Bernoulli variable p u , v , which

is fraction of bipartite graphs with degree sequence identical to G that contains edge

(

)

. The probability of observing H is

⎛

⎝

⎞

∏

⎠ .

(

p u , v

E (

−

p u , v )

(

) ∈

(

) ∈

In practice, a likelihood ratio is chosen, i.e.,

p c

−

p c

)= ∑

(

p u , v + ∑

(

log L

(

log

p u , v ,

−

) ∈

where p c ≥

max ( u , v ) ∈ U × V p u , v , which corresponds to the weight of subgraph H with

p c

−

p c

weight log

Then a hash technique is applied to solve the maximum biclique problem in order to

find the heavy subgraphs (biclusters). The final step of local improvement iteratively

applies the best modification to the bicluster.

In a recent study of Tanay et al. [53], this SAMBA has been extended to integrate

multiple types of experimental data.

p u , v >

0 of each edge

(

)

and log

p u , v <

0 for each nonedge

(

)

−

6.3.4 Based on Information Theory

In [18], Dhillon et al. proposed a biclustering algorithm based on information theory.

This information theoretic biclustering algorithm that simultaneously clusters both

the rows and the columns is called co-clustering by Dhillon et al.

Computational Neuroscience

Search WWH ::

Custom Search

Home