Biology Reference
In-Depth Information
used to delineate the set of genes and transcripts present in
a given genome or cellular condition, they can not be used
to monitor transcript abundances.
discovery of novel genes and the quantification of alter-
native splice isoforms. It combines both transcript
discovery, as in EST projects, and transcript quantification,
as in DNA microarrays.
Since the first RNASeq experiments, sample prepara-
tions and protocols have been developed to target specific
RNA populations within the cell (i.e., short vs. long RNAs,
polyadenylated vs. non-polyadenylated, etc.) or to enrich for
specific domains within the transcripts (5 0 or 3 0 ends, etc.).
These different protocols share a common set of elementary
components. With a few exceptions, massively parallel
sequencing instruments are unable to directly sequence
RNA, and therefore preparation of a cDNA library from the
RNA sample of interest is, as with EST sequencing andDNA
microarrays, the first step in the experimental protocol. In
contrast to ESTs and microarrays, random priming is
a commonly used alternative to polydT priming. This
usually leads to a more uniform representation along the
entire length of the transcript sequence in the library, mini-
mizing the 3 0 bias common to polydT priming. Next, frag-
mentation of the cDNA sequences is necessary owing to the
technological limitations of current sequencing technolo-
gies and their inability to obtain long (
DNA Microarrays
DNA microarrays (or DNA chips) have been the most
commonly used technique during the last two decades to
globally monitor cellular abundances of transcript species.
A DNA microarray is a collection of microscopic DNA
spots attached to a solid surface. Each DNA spot contains
many thousands of copies of a specific DNA sequence,
known as probes. These usually correspond to a short
section of a gene
generally at the 3 0 end. Each microarray
includes one or a few probe sets for each interrogated gene.
These are used to hybridize a cDNA sample (the target)
under high-stringency conditions. Probe
e
target hybrid-
ization is usually detected and quantified by detection of
fluorophore-, silver-, or chemiluminescence-labeled targets
to determine the relative abundance of transcripts in the
target sample. Data on about 700 000 sample hybridiza-
tions performed on DNA microarrays are accessible
through the databases Gene Expression Omnibus (GEO) at
NCBI, and ArrayExpress at EBI.
Because DNA microarrays require spotting of nucleo-
tide probes corresponding to known transcripts, only the
abundances of these transcripts can be monitored. Quanti-
fication of previously unknown transcripts
e
100 bp) sequence
reads. The most popular approaches for cDNA fragmenta-
tion are enzymatic digestion, nebulization, and hydrolysis.
To minimize biases arising from reverse transcription, some
protocols postpone this step until after fragmentation. Then,
during final library preparation, adapter sequences are
ligated to both ends of double-stranded cDNA molecules.
These mediate the binding of fragments to beads in the
sequencing medium, and harbor primer binding sites for
amplification. Before sequencing, the primary library is
amplified using polymerase chain reaction (PCR), as most
instruments cannot sequence single nucleic acid molecules.
To keep amplification biases under control, usually a size
selection step that homogenizes fragment length is per-
formed prior to amplification. Size selection is generally
implemented by gel electrophoresis. In the sequencing step
one arbitrary end (single reads) or both ends (paired end
reads) of the cDNA fragments in the library are sequenced.
Sequence reads obtained after sequencing are used to infer
transcript sequences and transcript abundances. Broadly,
reads are first mapped to the genome and to the transcriptome
of the organism investigated. If the genome is not available,
reads may be assembled de novo into transcript contigs. From
the mapped reads, it is possible to infer novel transcriptional
elements (splice junctions, exons, transcripts and genes), and
to quantify transcript abundances. Gene and transcript
abundances are usually measured in reads per kilobase per
million mapped reads (RPKM) [21] .Thisa lowsfor
normalization across experiments sequenced at very different
depths. Although it is difficult to extrapolate the number of
RNA copies per cell from the RPKMvalues in the absence of
knowledge of the quantity of RNA in the cellular fraction
>
often specific
to the particular cell type being interrogated, and therefore
particularly relevant to the phenotype of this cell type
e
is
impossible. Moreover, probes are usually shared between
multiple splice forms of the same gene, and unless specific
array designs are employed, it is impossible to deconvolute
the abundances of individual alternative transcript isoforms
from overall gene expression.
e
RNASeq
Thanks to impressive technological advances during the
last decade, massively parallel sequencing appears to be
providing the sequencing throughput required for direct
sequencing of cDNA libraries (RNASeq). In the most cost-
effective and popular approaches for transcriptome char-
acterization (i.e., those using the Illumina platform), many
very short sequence tags or reads (usually 50
100 bp long)
are obtained along the transcripts in the interrogated RNA
population. Sequenced reads are used to reconstruct tran-
script sequences. They are also used to quantify transcript
abundances, under the assumption that the number of
sequence reads originating from a given transcript is
roughly proportional to transcript abundance (see Wang
et al. [20] and Mortazavi et al. [21] for introductions).
RNASeq has a dynamic range similar to or larger than that
of DNA microarrays [22] , allowing in addition for the
e
Search WWH ::




Custom Search