Biology Reference
In-Depth Information
Chapter 13
The Complexity of Feature Selection for Consistent Biclustering
O. Erhun Kundakcioglu and Panos M. Pardalos
Department of Industrial and Systems Engineering
University of Florida
303 Weil Hall
Gainesville, FL, 32611, USA
{
erhun, pardalos
}
@ufl.edu
Biclustering is simultaneous classification of the samples and features in a way
that samples from the same class have similar values for that class' characteristic
features. A biclustering is consistent if in each sample (feature) from any set, the
average expression of features (samples) that belong to the same class is greater
than the average expression of features (samples) from other classes. Supervised
biclustering uses a training set to classify features whose consistency is achieved
by feature selection . The worst case complexity of this feature selection process
is studied.
13.1. Introduction
Biclustering is a methodology allowing simultaneous partitioning of a set of sam-
ples and their features into classes. Samples and features classified together are
supposed to have a high relevance with each other which can be observed by
intensity of their expressions. The notion of consistency for biclustering is de-
fined using interrelation between centroids of sample and feature classes. Pre-
vious works on biclustering concentrated on unsupervised learning and did not
consider employing a training set, whose classification is given. However, with
the introduction of consistent biclustering, significant progress has been made in
supervised learning as well.
A data set (e.g., from microarray experiments) is normally given as a rectan-
gular m
n matrix A , where each column represents a data sample (e.g., patient)
and each row represents a feature (e.g., gene)
×
A =( a ij ) m×n
257
Search WWH ::




Custom Search