The Complexity of Feature Selection for Consistent Biclustering - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

Chapter 13

The Complexity of Feature Selection for Consistent Biclustering

O. Erhun Kundakcioglu and Panos M. Pardalos

Department of Industrial and Systems Engineering

University of Florida

303 Weil Hall

Gainesville, FL, 32611, USA

{

erhun, pardalos

}

@ufl.edu

Biclustering is simultaneous classification of the samples and features in a way

that samples from the same class have similar values for that class' characteristic

features. A biclustering is consistent if in each sample (feature) from any set, the

average expression of features (samples) that belong to the same class is greater

than the average expression of features (samples) from other classes. Supervised

biclustering uses a training set to classify features whose consistency is achieved

by feature selection . The worst case complexity of this feature selection process

is studied.

13.1. Introduction

Biclustering is a methodology allowing simultaneous partitioning of a set of sam-

ples and their features into classes. Samples and features classified together are

supposed to have a high relevance with each other which can be observed by

intensity of their expressions. The notion of consistency for biclustering is de-

fined using interrelation between centroids of sample and feature classes. Pre-

vious works on biclustering concentrated on unsupervised learning and did not

consider employing a training set, whose classification is given. However, with

the introduction of consistent biclustering, significant progress has been made in

supervised learning as well.

A data set (e.g., from microarray experiments) is normally given as a rectan-

gular m

n matrix A , where each column represents a data sample (e.g., patient)

and each row represents a feature (e.g., gene)

×

A =( a ij ) m×n

257

Search WWH ::

Custom Search

Home