Biology Reference
In-Depth Information
We terminate this section with a few comments on two points related to the inter-
pretation of differential expression analysis. The first point is that with progressively
better control of the biological and the technical variance, one can anticipate finding,
as soon as the physiological status of the cell differs, a very large number of genes
statistically differentially expressed. In this situation, establishing a list of differen-
tially expressed genes would not help to understand the biology and another criterion
has to be considered for identifying interesting genes. In practice, one often feels that
the list of differentially expressed genes is either uncomfortably small or large. This
is not to say that gene-wise differential expression analysis is pointless, interpreting
differences in expression levels that are not supported from the statistical viewpoint
will often be misleading. When the number of statistically differentially expressed
genes is high, careful fold-change analysis and biological considerations would help
to disentangle the direct consequences of the difference in biological condition, as
opposed to secondary transcriptome modifications. Of note, principled methods to
test differential expression above a given fold-change cut-off have been proposed
( McCarthy and Smyth, 2009 ) and may become increasingly relevant as technologies
and thus statistical power provided by better experimental set-ups increases. The sec-
ond point worth mentioning is the concern that the actual level of correlation in
expression data might exceed what usual FDR control procedures can accommodate,
thereby leading to a wrong feeling of statistical confidence ( Qiu et al. , 2005 ).
Permutation approaches to FDR control are thus preferred by some authors because
of their greater robustness ( Tusher et al. , 2001; Ge et al. , 2003; Grant et al. , 2007 ). It
has also been argued that local FDR procedures, working directly with the test
statistics rather than with the derived p -values, might be more robust to strong
correlation ( Strimmer, 2008 ).
4.3 Towards gene expression networks
Networks provide the most relevant representations to explain many gene expression
patterns, with regulatory processes corresponding to network edges and the genes
whose expression is monitored correspond to network nodes. A second type of node
can be introduced to accommodate quantities that are not directly measured and
needs to be inferred, such as transcription factor activities, or other variables that
impact on gene expression. However, inferring global regulation networks that
map faithfully to the underlying biology is usually not possible from gene expression
data alone, due to the impact of missing observations such as post-transcriptional
processes, metabolites and environmental variables. In practice, many popular
approaches tackle less ambitious problems by introducing more ad hoc concepts
such as “influential networks” whose edges represent direct and indirect relation-
ships; underlying “expression modes” or “eigengenes” that can be assimilated to
transcription factor activities; gene clusters that can be related to actual biological
regulons. The second route to network inference from expression data is to focus
on a subnetwork and to incorporate prior biological knowledge and additional data
sources. In this section, we give a selective overview of the first set of approaches
Search WWH ::




Custom Search