Biology Reference
In-Depth Information
ples with percent coefficients of variance usually in the low single digits [24, 29].
In contrast, protein expression data involves technologies that are more compli-
cated and difficult to standardize. Technical reproducibility of protein expression
data collected from identical samples often has percent coefficients of variance in
the low double digit range [22, 31].
10.3. Correlation Computations
We employ the aforementioned 30 samples to compute a correlation matrix. The
matrix entry at location ( i,j ) denotes the correlation coefficient between the i th
and j th items (genes or proteins), normalized to the range [-1.0,1.0]. Because
mRNA arrays alone can measure over 45,000 different values, we may be faced
with making sense of over a trillion correlate pairs. Close examination of the data
reveals a paucity of outliers, so that we are able to use the well-known Pearson's
method for the computation of correlation coefficients. Because we are searching
for putative pathways and networks, both positive and negative correlations are of
equal interest. We therefore take absolute correlation values. Recall that this is
biological and hence noisy data. Not every probe set is reliably measured in every
sample. Thus we move away from simple correlation and compute a p-value
for each pair of correlates, which is the probability that they have a correlation
different from zero [33]. See Fig. 10.2.
From this we can build a simple, unweighted graph as needed with the use of
a cut-off value (we favor the use of p=0.01) and a high-pass filter. An edge whose
weight is less then the cut-off is discarded. Other edges are retained, but their
weights are now ignored.
10.4. Clique and Its Variants
We assume the reader is familiar with standard concepts in graph and complexity
theory [25, 30]. We begin with the well-known clique problem. A clique is a
densest possible subgraph. Each pair of its vertices is connected by an edge. A
clique is maximum if it is a largest clique in a graph. A clique is maximal if it is
not contained wholly within a larger clique. A clique on five vertices is illustrated
in Fig. 10.3. Protein correlations are too weak to find relevant relationships at this
level, and so for them we turn to other methods as will be described in Section
10.6. The correlation matrix is transformed into a complete, weighted correla-
tion graph by using a vertex for each transcript and protein, and by weighting the
edge between each pair of items with the corresponding correlation matrix entry.
Clique is widely acknowledged for its many applications in computational molec-
Search WWH ::




Custom Search