Alternative splicing: conservation and function (Genomics)

At least half of human genes seem to be alternatively spliced (Lander et al., 2001). This estimate is mainly based on the comparison of genomic DNA with EST (expressed sequence tag, see Article 78, What is an EST?, Volume 4) sequences (Mironov et al., 1999; Brett et al., 2000), and thus is subject to uncertainty stemming from the fact that the ESTs do not necessarily correspond to functional mRNA. Even if experimental artifacts such as underspliced transcripts could be eliminated, there remains a problem of errors by the splicing machinery itself, the so-called aberrant splicing (see Article 87, Manufacturing EST libraries, Volume 4).

Indeed, the normalization of mRNA concentrations during construction of clone libraries leads to the sequencing of ESTs arising from rare mRNA isoforms. Further, computational analysis has demonstrated the existence of numerous cancer-specific ESTs (more exactly, ESTs corresponding to cancer-specific alternatively spliced isoforms, see Article 82, Using ORESTES ESTs to mine gene cancer expression data, Volume 4) (Wang et al., 2003; Sorek et al., 2003; Xie et al., 2002; Xu and Lee, 2003), the emergence of which could be due to the general disruption of control mechanisms in cancerous cell lines. Although one could claim that almost all human genes show some evidence of alternative splicing, when stricter criteria are considered (e.g. at least two ESTs supporting an alternative splicing event), the fraction of alternatively spliced genes decreases to 17-28% (Kan et al., 2002).


A new twist to this discussion was added when several groups attempted to compare alternative splicing of human and mouse genes (Thanaraj et al., 2003; Modrek and Lee, 2003; Modrek et al., 2001; Nurtdinov et al., 2003). Surprisingly, it turned out that a considerable fraction of human genes have alternatively spliced isoforms, which are not conserved in mouse.

Two different approaches have been applied to compare human and mouse alternative splicing. One of them was a direct comparison of human and mouse ESTs. This approach demonstrated that at least 15% of human splice junctions (introns) are conserved in mouse (Thanaraj et al., 2003). A similar estimate was made for different types of elementary alternatives considered separately, at that, exon skipping events were shown to be more conserved than alternative splicing sites (Sugnet et al., 2004). However, as the mouse EST data at least is far from saturation, this is clearly a lower bound on the fraction of conserved alternative splicing.

The other approach is based on aligning human protein isoforms to mouse geno-mic DNA using spliced alignment algorithms (Mironov et al., 2001; see also Article 15, Spliced alignment, Volume 7) or simply BLAST (Altschul et al., 1997; see also Article 39, IMPALA/RPS-BLAST/PSI-BLAST in protein sequence analysis, Volume 7). At that, an isoform is assumed to be conserved if the alternative region aligns to the mouse genome without frameshifts and is bounded by the standard GT-AG dinucleotides. It is clear that this definition yields an upper estimate on the number of conserved isoforms, since these conditions are necessary but not sufficient: an isoform may be nonexistent due to changes in splicing site positions other than GT-AG, or to changes in regulatory sites such as splicing enhancers. Further, this definition does not take into account nonconserved exon skipping events.

This approach, applied in (Nurtdinov et al., 2003), demonstrated that at least half (55%) of 166 alternatively spliced human genes had isoforms not conserved in their mouse orthologs. This was due to about 25% of unconserved elementary alternatives. Notably, similar results were obtained for elementary alternatives confirmed by mRNAs (24% nonconserved) and by ESTs only (31%).

A much larger sample in a similar setting was analyzed in (Modrek and Lee, 2003), where only cassette exons were considered. All such exons were divided into exons included in the major isoforms (i.e., present in the majority of ESTs overlapping the relevant region) and the minor form exons. The former were found to be conserved in 98% of cases, whereas only about a quarter (27%) of the latter were conserved. Similar results were obtained in a smaller-scale human-rat comparison. The average conservation of both types of exons, 75%, is remarkably close to the degree of conservation of elementary alternatives reported in (Nurtdinov et al., 2003).

An important question is whether these nonconserved alternatives are real or arise from splicing errors, and so on. The number of documented functional non-conserved alternatively spliced isoforms is not large (Nurtdinov et al., 2003). In fact, it has been suggested that most nonconserved isoforms are not functional (Sorek et al., 2004). The fraction of nonconserved cassette (skipped) exons, identified by a combination of EST analysis and genomic comparisons, was similar to that of the two studies mentioned above (75%). However, it was demonstrated that most nonconserved exons (79%) either led to a frameshift (because their length did not contain an integer number of triplets) or contained an in-frame stop codon. By contrast, only 27% of conserved cassette exons interrupted the reading frame. The difference decreased when exons supported by multiple ESTs were considered (46% interrupting exons, among exons supported by at least five ESTs).

Does that mean that the majority of nonconserved isoforms are nonfunctional? Frame interruption per se does not make an isoform nonfunctional. Indeed, about 40% of both human (Modrek et al., 2001) and mouse (Zavolan et al., 2002) alternative isoforms identified from EST and full-length cDNA analysis have an interrupted reading frame, and a slightly smaller estimate (22%) was obtained in the analysis of published experimental data (Thanaraj and Stamm, 2003). An intermediate number of alternative isoforms (35%) was reported in (Lewis et al., 2003); moreover, it was demonstrated that most such isoforms would be subject to nonsense-mediated mRNA decay, as the stop codon occurred more than fifty nucleotide upstream of the 5′-most exon-exon junction. As this trend persisted after the filtering of less-reliable isoforms, it is likely that the frame-interrupting isoforms are functional; one suggested possibility was that they are involved in the regulation of splicing, translation, and mRNA degradation.

Indeed, a different line of evidence for functionality of nonconserved isoforms was considered in (Modrek and Lee, 2003). In many cases, the minor form non-conserved exons not only were supported by multiple ESTs but also demonstrated evidence for tissue-specific expression, and constituted a majority in this tissue.

Thus, an open question seems to be not the reality of nonconserved isoforms but their functionality. A large-scale proteomic study will be necessary to determine whether these isoforms are translated and yielded protein products.

In any case, alternative splicing was demonstrated to have a major effect on the protein structure (Kriventseva et al., 2003). Indeed, when compared with a random model, alternative splicing was shown to prefer shuffling complete protein domains instead of disrupting domains or falling into interdomain regions and to target functional sites when it is occurring within a domain. Indeed, alternative splicing often involves domains implicated in protein-protein interactions (Resch et al., 2004). Further, it was shown that alternative splicing has a tendency to remove gene regions encoding signal peptides and single transmembrane segments, thus producing secreted, membrane-bound, and cytozolic isoforms of proteins (Xing et al., 2003; Cline et al., 2004).

Thus, alternative splicing is a major mechanism for generating protein diversity, both in extant organisms and in evolution. The latter explanation is supported by additional observations: evidence for positive selection based on analysis of synonymous and nonsynonymous nucleotide substitutions (Iida and Akashi, 2000) and the fact that all Alu-derived protein-coding regions of human genes are alternatively spliced (Sorek et al., 2004). Indeed, an elegant theory of Modrek and Lee (2003) states that alternative splicing provides an organism with a possibility to experiment with new protein functions while not disrupting the old protein. If a new variant proves to be beneficial, its fraction may increase due to subtle changes in regulatory sites.

However, this does not explain why generation of protein variability cannot be obtained by gene duplication. Another less-appreciated role of alternative splicing could be that of maintaining protein identity. Indeed, in many cases, a cell needs proteins that are different in some domains and exactly identical in others. The most obvious example of this is given by membrane, secreted, and intracellular isoforms of various receptors. The recognition or ligand-binding domain should be the same, whereas the membrane anchor or a signal peptide is encoded by alternative exons. It is clear that such an arrangement cannot be obtained by gene duplication, as this would require an expensive mechanism for maintaining the identity of those DNA fragments that should encode identical domains.

Overall, computational comparative analysis of alternative splicing is a hot and important topic. The next step probably would be in merging the diverse approaches, aimed at the description of all aspects of the alternative splicing phenomenon: evolution of the exon-intron structure and of sequence in alternatively spliced regions, regulation, consequences for the protein structure and function, and so on. And, it is clear that such analyses will not be restricted to the study of mammals (human-mouse-rat). Other appealing groups of already available genomes are the two nematodes (Caenorhabditis elegans and Caenorhabditis briggsae, see Article 44, The C. elegans genome, Volume 3) and also fruit flies (Drosophila melanogaster, Drosophila pseudoobscura, and others) with the malarial mosquito Anopheles gambiae serving as an outlier; to be complemented, as sequencing of eukaryotic genomes progresses, by chicken, fishes (Takifugu rubrupes, Danio rerio, see Article 46, The Fugu and Zebrafish genomes, Volume 3), honeybee, and plants.

Next post:

Previous post: