Biology Reference
In-Depth Information
The mean and variance of the values in row 3 are
x 31 þ
x 32 þ ...
x 3n
x 3
¼
n
and
2
2
2
s 3 ¼ ð
x 31
x 3 Þ
þð
x 32
x 3 Þ
þ ... þð
x 3n
x 3 Þ
n
1
Each value in the third row then will be normalized to the value
x 3j
x 3
y 3j ¼
;
(12-2)
s 3
where j
¼
1,2,
n.
...
The result of applying this normalization to the data depicted in
Figure 12-9(A) is presented in Figure 12-9(B). Genes A and C have been
brought closer together, genes E and F have remained close, and
gene D differs in expression from both groups.
B. Cluster Analysis Fundamentals
In the hypothetical example depicted in Figure 12-9(A), we were able to
visually identify the similarities in gene expression patterns across the
tissues. In reality, because the expressions of thousands of genes are
examined in a large number of tissues, visual differentiation based on
observed similarities is impossible. We want to be able to discover
patterns in the data—for instance, which genes are turned off in cancer
cells and which genes are turned on. Further, we want to know how
the patterns of gene expression vary from tumor type to tumor type. If
there are patterns of gene expression that are common to certain tumor
types, this may be indicative of common functionality. The fact that
some genes are expressed in similar patterns does not necessarily mean
that their gene products interact with each other, but they might. If we
do not know which genes are co-expressed, we cannot study them to
determine whether they are interacting. Clearly, we need a quantitative
method that will allow us to detect these patterns reliably, so that we can
find the ''needles'' of specific gene information in this ''haystack'' of data
regarding thousands of genes.
The methods available to classify co-expressed genes into groups can
be broadly divided into two categories called supervised and
unsupervised learning. In supervised learning, the genes are divided into
a fixed number of predefined groups. These could be defined
qualitatively, for example as ''diseased'' or ''normal,'' or be
quantitatively defined by their number. In unsupervised learning, the
genes are grouped into categories based on similarities in their
expression profiles. The computational method used to perform the
partition into groups is generally referred to as cluster analysis.
Search WWH ::




Custom Search