Biology Reference
In-Depth Information
changes, for instance, via pairwise log2-scale scatter plots of the data, may identify
the origin of the problem and it can be sound to discard a particular sample.
4.2 Expression and differential expression
Even if many, and often a majority, of the genes display a non-ambiguous expression
signal, establishing comprehensive lists of expressed versus non-expressed regions
from a microarray expression data set turns out to be difficult. The problem stems
from background optical and hybridization noises that generate a non-zero baseline
imposing a lower limit to detection. In practice, it is not possible to say whether or not
a gene is expressed or is expressed below the detection limit. This is inherent to the
technology and cannot be compensated for by increasing the number of experiments
or probes, but sensitivity differs between platforms. One motivation for pairing each
Perfect Match (PM) probe with a so-called Mismatch (MM) probe that differ by a
single nucleotide substitution in Affymetrix GeneChip design was precisely to
address this issue directly: p -values for a PM signal above the MM signal can be
computed and interpreted as a direct assessment of the presence of a transcript. From
a number of literature reports, it seems that the benefit of this MM probes was not
obvious (see, for instance, Bolstad et al. , 2004 ) and most custom arrays, as well as
newer Affymetrix designs, do not include MM probes.
The general procedure is to call “expressed” those genes having an aggregated
expression signal that scores above a certain level ( Rasmussen et al. ,2009 ), or a dis-
tribution of probe-level intensities that differ significantly from an estimated back-
ground distribution ( Zhou and Abagyan, 2002 ). Unfortunately, whatever the
validity of the approach, the number of called expressed genes will be related not only
to the underlying biology but also to the specific signal-to-noise ratios achieved in a
particular experiment (or hybridization). Importantly, these technical variations cannot
be filtered out by normalization. Fortunately, asking whether a gene is expressed or not
may not be the most biologically relevant question when analysing bacterial microar-
ray data sets. Indeed, the current microarray technologies allow for the detection of a
consistent expression signal for transcripts with abundance orders of magnitude below
one copy per cell. Interpreting this expression signal in terms of coexistence of differ-
ent subpopulations of cells and/or of background stochastic transcription noise seems
therefore more relevant than simply establishing a list of expressed genes.
The situation is different when the purpose is not to compare lists of transcribed/
non-transcribed regions between different conditions but to map new transcripts, a
question that typically arises in tiling array data analysis. The authors mapped new
transcripts using a compendium of 269 tiling array hybridizations by relying on a prob-
abilistic signal smoothing procedure to reconstruct the transcriptional landscape along
the chromosome from each hybridization ( Figure 6.3 ; Nicolas et al. ,2012 ). The under-
lying probabilistic model accounts for abrupt shifts such as transcription breakpoints
that correspond to transcription start and termination sites and smaller signal drifts that
can reflect artefacts, or reveal expression gradients generated by the interplay of 5 0 to 3 0
synthesis and “random” termination. For each probe,
this led to an expected
Search WWH ::




Custom Search