Databases Reference
In-Depth Information
Sample/condition
w 11
w 12
w 1 m
w 21
w 22
w 2 m
Gene
w 31
w 32
w 3 m
w n 1
w n 2
w nm
Figure11.3 Microarrary data matrix.
Example11.12 Gene expression. Gene expression matrices are popular in bioinformatics research and
development. For example, an important task is to classify a new gene using the expres-
sion data of the gene and that of other genes in known classes. Symmetrically, we may
classify a new sample (e.g., a new patient) using the expression data of the sample and
that of samples in known classes (e.g., tumor and nontumor). Such tasks are invaluable
in understanding the mechanisms of diseases and in clinical treatment.
As can be seen, many gene expression data mining problems are highly related to
cluster analysis. However, a challenge here is that, instead of clustering in one dimension
(e.g., gene or sample/condition), in many cases we need to cluster in two dimensions
simultaneously (e.g., both gene and sample/condition). Moreover, unlike the clustering
models we have discussed so far, a cluster in a gene expression data matrix is a submatrix
and usually has the following characteristics:
Only a small set of genes participate in the cluster.
The cluster involves only a small subset of samples/conditions.
A gene may participate in multiple clusters, or may not participate in any cluster.
A sample/condition may be involved in multiple clusters, or may not be involved in
any cluster.
To find clusters in gene-sample/condition matrices, we need new clustering tech-
niques that meet the following requirements for biclustering :
A cluster of genes is defined using only a subset of samples/conditions.
A cluster of samples/conditions is defined using only a subset of genes.
 
Search WWH ::




Custom Search