Biology Reference
In-Depth Information
canonical splicing signals, exhibiting exon and intron
lengths typical of protein-coding genes. In contrast to
protein-coding genes, however, lncRNAs display a striking
bias towards two-exon transcripts, they are predominantly
localized in the nucleus, and a fraction of them appear to be
preferentially post-processed into small RNAs. They are
under significant negative evolutionary selection
protein-coding genes and lncRNAs. A considerable frac-
tion (29%) of all expressed lncRNAs are restricted to only
one of the cell lines, while only about 2% are expressed in
all cell lines. Conversely, although a large fraction (55%) of
expressed protein-coding genes are expressed in all cell
lines, only about 7% are cell-line specificity. Variability of
gene expression across cell lines is also greater for anno-
tated non-coding RNAs than protein-coding RNAs.
Expression variability is larger in the nucleus than in the
cytosol, possibly indicating that the contents of the cyto-
solic subcompartment are buffered against stochastic
transcription through selective export in the nucleus. All
these results are consistent with the possibility that
lncRNAs may contribute more to cell line specificity than
protein-coding genes.
In principle, by attributing the sequence reads to the
underlying RNA isoforms, RNASeq allows for the quan-
tification of the abundance of individual transcript species.
As we have pointed out, however, read deconvolution from
RNASeq reads is a difficult problem, given the fact that
different isoforms from the same gene share a substantial
fraction of their sequence. The accuracy of methods to
quantify the abundance of individual transcript isoforms is
not currently established, and therefore the results of the
analyses based on transcript quantification should be
considered only indicative. Still, a number of trends appear
to emerge [80] . When monitored in a homogeneous pop-
ulation of cells, genes tend to express many isoforms
simultaneously. Isoforms, however, are not expressed at
similar levels, and usually one dominant isoform captures
a large fraction of the total expression of a given gene. The
dominant isoform often changes between conditions and
cell types. By comparing changes in gene expression and
changes in individual transcript abundances across multiple
conditions, it is possible to investigate whether changes in
the abundances of individual transcripts across cell lines are
mostly due to changes in overall gene expression or rather
to changes in the splicing ratios within a gene. RNASeq
analysis of individuals and the ENCODE cell lines reveals
that between 60% and 70% of the variability in the abun-
dance of transcript isoforms can be explained by variability
in gene expression [80,81] . Thus, regulation at the level of
gene expression appears to have a higher impact on the
abundance of spliced isoforms than does regulation at the
level of splicing.
Beyond contributing to the quantification of known
genes and transcripts, RNASeq experiments usually
uncover additional evidence for sites of transcription,
both within and outside the boundaries of annotated genes
[13,80,82] . For instance, although only about 85% of the
GENCODE genes are annotated with multiple transcripts,
Wang et al. [13] used RNASeq data to estimate that as many
as 92
which
is weaker in exons, but not in promoters, compared to what
is observed in protein-coding genes. A relatively large
subset seems to have arisen within the primate lineage. A
subset of lncRNAs with high evolutionary conservation but
ambiguous coding potential may function as non-coding
RNAs, but, alternatively, encode small peptides. LncRNAs
are consistently expressed at lower levels than protein-
coding genes and display more tissue-specific expression
patterns, with a large fraction of tissue-specific lncRNAs
expressed in the brain. Correlation of expression analysis
indicates that lncRNAs show particularly striking positive
correlation with the expression of overlapping antisense
coding genes. Finally, a few hundred lncRNAs reside
within intergenic regions previously associated with
specific diseases/traits by genome-wide association studies,
and they could be candidates for future disease-focused
studies.
e
The Expressed Transcriptome
Cellular specificity is achieved by the regulated expression
of selected sets of genes and transcripts. Studies using
expressed sequence tags have yielded relatively low esti-
mates of tissue specificity, but, as we have pointed out, have
limited statistical power to detect differences in isoform
levels because of normalization. Microarray analyses have
achieved more consistent coverage of tissues, but are con-
strained in their limited ability to distinguish closely related
mRNA isoforms, and their inability to identify novel
transcribed elements. RNASeq has the potential to
circumvent these limitations. Recent RNASeq analysis
performed in 15 different cell lines and subcellular
compartments in the context of the ENCODE project
provides a rich unbiased survey of the transcriptional
landscape of the human genome [80] . These analyses
reveal that, within a given cell line, on average expression is
detected for about 40% of the genes. The range of gene
expression values within a cell covers about six orders of
magnitude for protein-coding and non-coding transcripts
(from 10 e 2 to 10 4 RPKM) measured in the polyadenylated
RNA fraction of the cell, but it is consistently lower for
lncRNAs. Cumulatively, about 65% of all annotated genes
(91% of all protein-coding genes) are detected in the panel
of 15 investigated cell lines. Overall, about 14% of the
genes are specific to one cell line, and 37% are detected in
all cell lines and could be considered constitutively
expressed. However, the behavior is strikingly different for
94% of human genes undergo alternative splicing.
Overall, RNASeq studies underline that the transcriptional
e
Search WWH ::




Custom Search