The Complexity of Feature Selection for Consistent Biclustering - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

where a ij is the expression of i th feature in j th sample.

Biclustering is applied by simultaneous classification of the samples

and features (i.e., columns and rows of matrix A , respectively) into k

classes. Let S 1 ,S 2 ,...,S k denote the classes of the samples (columns) and

F 1 ,F 2 ,...,F k denote the classes of features (rows). Formally biclustering

can be defined as a collection of pairs of sample and feature subsets

B

=

{

( S 1 ,F 1 ) , ( S 2 ,F 2 ) ,..., ( S k ,F k )

}

such that

a j

S 1 ,S 2 ,...,S k ⊆{

} j =1 ,...,n ,

k

a j

S r =

{

} j =1 ,...,n ,

r =1

S ζ S ξ =

∅⇔

ζ

= ξ,

F 1 ,F 2 ,...,F k ⊆{

a i } i =1 ,...,m ,

k

F r =

{

a i } i =1 ,...,m ,

r =1

F ζ F ξ =

∅⇔

ζ

= ξ,

a j

where

{

} j =1 ,...,n and

{

a i } i =1 ,...,m denote the set of columns and rows of the

matrix A , respectively.

The ultimate goal in a biclustering problem is to find a classification for which

samples from the same class have similar values for that class' characteristic

features. The visualization of a reasonable classification should reveal a block-

diagonal or “checkerboard” pattern. A detailed survey on biclustering techniques

can be found in [5] and [8].

The concept of consistent biclustering is introducted in [3]. Formally, a bi-

clustering

is consistent if in each sample (feature) from any set S r (set F r ),

the average expression of features (samples) that belong to the same class r is

greater than the average expression of features (samples) from other classes. The

model for supervised biclustering involves solution of a special case of fractional

0-1 programming problem whose consistency is achieved by feature selection.

Computational results on microarray data mining problems are obtained by refor-

mulating the problem as a linear mixed 0-1 programming problem.

An improved heuristic procedure is proposed in [9], where a linear program-

ming problem with continuous variables is solved at each iteration.

B

Numerical

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home