Biology Reference
In-Depth Information
approach to the understanding andmodeling of the pathways
involved in RNA synthesis and processing.
In this chapter we will first briefly review these cellular
pathways. We will then review the experimental and
computational methods that are commonly used to charac-
terize and monitor cellular transcriptomes, and then
summarize the efforts leading to the definition of the human
reference transcriptome. We use the term reference tran-
scriptome to denote the set of all genes and transcripts
potentially encoded in a given genome. We will next
describe the characteristic features of the human tran-
scriptome. We will deal separately with the protein-coding
transcriptome
Whereas transcription of prokaryotic protein-coding
genes creates a messenger RNA molecule (mRNA) which
is ready for translation, transcription of eukaryotic genes
produces a primary transcript of RNA (pre-mRNA) which
undergoes a series of modifications before becoming
a mature mRNA. These include 5 0 capping, which involves
a set of enzymatic reactions that modify the 5 0 end of the
pre-mRNA and thus protects the RNA from degradation by
exonucleases. Another modification is 3 0 cleavage and
polyadenylation. They occur if the polyadenylation signal
sequence (5 0 - AAUAAA-3 0 ) is present near the 3 0 end of the
pre-mRNA sequence. The pre-mRNA is first cleaved and
then a series of about 200 adenines are added to form a 3 0
poly(A) tail which protects the RNA from degradation. The
poly(A) tail is bound by multiple poly(A)-binding proteins
necessary for mRNA export to the cytosol.
The most notable modification in eukaryotic pre-mRNA
is RNA splicing. During the process of splicing, an
RNA
that is, the set of genes and transcripts that
e
code for proteins
the
set of genes and transcripts that are not translated into
proteins. Many of the characteristic features of the human
transcriptome can be extrapolated to other mammalian,
vertebrate, and even metazoan genomes. In the final section
we will review recent work, mostly based on RNASeq
profiling, contributing to the characterization of the
expressed transcriptome. In contrast to the reference tran-
scriptome, the expressed transcriptome refers to the set of
genes and transcripts that are expressed in a given condition,
and which are therefore responsible for cellular specificity.
e
and the non-coding transcriptome
e
protein catalytic complex known as the spliceosome
catalyzes two trans-esterification reactions in which inter-
vening sequences in the pre-mRNA (the introns) are
excised and released in the form of lariat structures, and
neighboring sequences (the exons) are concatenated
together to form the mature mRNA. Often, introns or exons
can be either removed or retained in mature mRNA. This
so-called alternative splicing creates series of different
transcripts originating from a single gene, increasing the
transcriptional complexity beyond that simply reflected in
gene number. Until very recently, splicing was assumed to
occur mostly in pre-mRNA sequences destined to be
translated into proteins (see below). However, an emerging
class of long RNA molecules that
e
THE PATHWAY FROM DNA TO PROTEIN
SEQUENCES
The pathway leading from DNA to protein and functional
RNA sequences includes a number of steps, which are
relatively well characterized. The first step is the transcrip-
tion of DNA into RNA. During transcription, RNA poly-
merase copies the DNA template into a complementary
RNA molecule. Specific DNA sequences in the 5 0 upstream
region of genes
lack protein-coding
capacity
seem to
be subjected to the same splicing process as mRNAs
In eukaryotes, mRNAs are usually exported to the
cytoplasm, where they are translated into proteins.
LncRNAs, although mostly of nuclear function, may also
be transported to the cytosol and localize to specific
subcellular compartments. During translation, mRNAs are
decoded by the ribosome to produce specific amino acid
chains, or polypeptides. Initiation of translation involves
the ribosome binding to the 5 0 end of the mRNAs with the
help of a number of proteins known as initiation factors.
The nucleotides in the RNA are then 'read' by the ribosome
in consecutive non-overlapping triplets, known as codons.
Each codon is translated to a specific amino acid. The
equivalence of codons and amino acids, known as the
genetic code, is implemented through the collection of
tRNAs in the cell. Each tRNA carries at one end the so-
called anticodon sequence, a triplet that will base-pair with
the codons in the mRNA sequence, and is charged with the
corresponding specific amino acid at the other end. The
ribosome induces the binding of tRNAwith complementary
anticodon sequences to the triplet sequences in the mRNA.
the long non-coding RNAs, lncRNAs
e
e
act as
binding sites for proteins called transcription factors that
recruit the RNA polymerase. Transcription factors interact
in these regions with sequence-specific elements or motifs,
the transcription factor-binding sites. These are typically
5
the so-called promoter region
e
e
8 nucleotides long, and one promoter region usually
contains many of them to harbor different transcription
factors. The interplay between these factors is not well
understood, but in eukaryotes the motifs appear to be
arranged in specific configurations that confer on each gene
an individualized spatial and temporal transcription
program. In many eukaryotic promoters between 10% and
20% of all genes contain a TATA box (sequence TATAAA),
which in turn binds a TATA-binding protein, which assists in
the formation of the RNA polymerase transcriptional
complex. Specific modifications in the histones (the proteins
that form the nucleosomes, the basic DNA packaging unit)
regulate the binding of transcription factors to the promoter
region, in this way contributing to control gene expression.
e
Search WWH ::




Custom Search