Exonic splicing enhancers and exonic splicing silencers (Bioinformatics)

1. Signals affecting the splicing of messenger RNA precursors

RNA splicing is the process by which some sections of a primary RNA transcript (the introns) are removed, and those sections that are retained (the exons) are joined together. Splicing is carried out by the spliceosome, a large macromolecular machine consisting of five spliceosomal RNAs (U1, U2, U4, U5, and U6) and perhaps as many as 100 proteins. The assembly of a spliceosome on the pair of splice sites flanking an intron involves the recognition of splicing signals in the messenger RNA precursor (pre-mRNA) by splicing factors. Any element in the DNA sequence of a gene that helps to specify the accurate splicing of its primary RNA transcript to generate the mature RNA product is a splicing signal. In the case of genes encoding proteins, splicing signals are at the splice sites themselves and in the exons and introns flanking splice sites. Splicing signals that promote splicing and lie within exons are known as exonic splicing enhancers (ESEs), while signals within exons that repress splicing are known as exonic splicing silencers (ESSs).

5′ splice sites (those at the 5′ boundary of an intron) are generally similar to each other at the last three exon nucleotides and the first seven intron nucleotides, so that the sequence spanning the splice site resembles MAG| GTRAGTA (where M indicates A or C and R indicates A or G; the Ts here would be U in the RNA sequence). The underlined GT dinucleotide is invariant, or nearly so. This consensus is complementary to the 5′ end of the U1 small nuclear RNA, which is a component of the U1 snRNP (small nuclear ribonucleoprotein). Although recognition of 5′ splice sites during the initial stages of splicing is primarily accomplished by the U1 snRNP, it does not act in isolation. Not all sites bound by the U1 snRNP are ultimately used as 5′ splice sites. Factors bound to other sites on the pre-mRNA, some of which are discussed below, can promote binding between U1 snRNP and the 5′ splice site or can facilitate progression along the pathway toward splicing. Furthermore, the 5′ splice site is later “examined” by additional factors in the course of splicing. One such factor is U6 snRNA, which displaces U1 prior to the first catalytic step, and remains associated with the 5′ splice site throughout the remainder of the splicing reaction. In general, initial selection of the 5′ splice site by the U1 snRNP is influenced by other factors, and must be followed by appropriate interactions between the 5′ splice site and other components of the spliceosome.


The 3′ splice site usually occurs immediately 3′ of the trinucleotide CAG or UAG (and occasionally AAG, but almost never GAG). In addition, there is some conservation of the first nucleotide of the exon adjacent to the 3′ splice site, bringing the number of conserved 3′ splice site nucleotides to four (YAG|G, where Y is C or U and the underlined AG dinucleotide is invariant, or nearly so). How is it possible for the 3′ splice site to be specified by so few nucleotides? The answer to this question lies in the sequence immediately upstream of the 3′ splice site. Additional signals here include the branch site, which is the site where the 5′ end of the intron is joined to the intron via an unusual 2’5′ phosphodiester bond in the first step of splicing. The branch site is typically 16 to 45 nucleotides upstream of the 3′ splice site. The branch site is first recognized by a protein known as SF1 and later associates with U2 snRNA. The region between the branch site and the 3′ splice site is bound by the large subunit of U2AF (U2 auxiliary factor), a dimeric protein that later recruits the U2 snRNP to the branchpoint immediately upstream. The small subunit of U2AF binds to the 3′ splice site itself. As with the U1 snRNP, recruitment of U2AF by other factors plays an important role in the selection of splice sites. All of these factors are highly conserved among eukary-otes, but there are considerable differences among species with regard to the extent to which different components of the multipart 3′ splice site signal are conserved.

2. Exonic splicing enhancers

Auxiliary signals that are distinct from the core splice site signals described above also influence the outcome of pre-mRNA splicing. Prominent among these are ESEs, sequences in flanking exons that are required for splicing to occur, either in vivo or in vitro. Although such splicing enhancers have been identified in both exons and introns, ESEs are generally better characterized, and are probably more common. ESEs activate nearby splice sites (both 5′ and 3′ splice sites) and promote the inclusion (vs. skipping) of exons in which they reside. Initially, ESEs were recognized as purine-rich motifs containing repeated GAR (GAA or GAG) trinucleotides. However, many other sequences have now been shown to have enhancer activity (see Zheng, 2004 for review). A small number of well-defined enhancer-dependent splicing events (notably the IgM M2 exon and the female-specific exon of the Drosophila melanogaster doublesex gene) have been used by researchers in the field to define and characterize numerous splicing enhancers, and this approach is validated by the general observation that an ESE in one assay is typically active in others.

Many ESEs are bound and activated by one or more of several related splicing factors known as SR proteins (Graveley, 2000; Cartegni et al., 2002). SR proteins contain either one or two RNA-binding domains and “RS” domains that are characterized by numerous arginine-serine dipeptide repeats. SR proteins are not only essential for splicing but also for each of the first three recognizable steps of spliceosome assembly. In vitro, any one of the several SR proteins can restore splicing to a splicing extract lacking SR proteins. Thus, the essential functions of individual SR proteins in splicing are at least partially redundant. However, there is considerable specificity to the activation of splicing by SR proteins through ESEs. Individual SR proteins differ with respect to the sequence-specificity of their RNA-binding domains, and with respect to their ability to recognize and activate different ESE sequences (e.g., Liu etal., 1998).

The relationship between sequence-specific binding by SR proteins and the activation of splicing by ESEs is complex and incompletely understood. Both restoration of splicing and activation of some enhancer-dependent splicing events by an SR protein lacking the RS domain have been reported (Zhu and Krainer, 2000). Conversely, recruitment of the RS domain of SR proteins to an RNA by means of an unrelated RNA-binding domain is sufficient to activate enhancer-dependent splicing events, implying that the role of ESEs is to recruit such an activation domain (Graveley and Maniatis, 1998). Although only a few dozen splicing events have been shown to be enhancer dependent, the existence of ESEs within constitutively spliced exons (e.g., Schaal and Maniatis, 1999) suggests that ESEs are ubiquitous, redundant, and required for all splicing events. This possibility is supported by the observation that RNA-binding activity is required in order for an SR protein to support splicing in vitro. Recent evidence that SR proteins function in mRNA export (Huang et al., 2004) and nonsense-mediated decay (Zhang and Krainer, 2004) suggests that ESEs may play a role in these processes as well.

3. Exonic splicing silencers

Sequences in exons that act to suppress local splicing events are known as exonic splicing silencers or ESSs. Although ESSs are less well characterized than ESEs, they may be just as important in overall splice site selection, particularly in suppressing the inclusion of pseudoexons (regions within long introns that might otherwise be incorporated into mRNA as an exon (Sun and Chasin, 2000)). In many cases, ESSs are known to be bound by hnRNP proteins (particularly hnRNP A, hnRNP H, or hnRNP I, which is also known as PTB, pyrimidine-tract binding protein), but it appears likely that the basis of ESS activity is more variable than ESE activity. In theory, an ESS (acting through whatever protein or proteins recognize it) can act either by blocking the action of one or more ESEs or by directly blocking the activation of nearby splice sites.

4. Identification of ESEs and ESSs

It appears likely that many sequences can act as splicing enhancers or silencers, and it is estimated that as many as 15-20% of random sequences 20-nt long contain a splicing enhancer (Blencowe, 2000). Originally, ESEs were identified when mutational analysis of genes revealed that sequences within exons were required for proper splicing. Significantly more sequences were found when high-throughput selections for short sequences that might act as enhancers were carried out, either in vitro (e.g., Liu et al., 1998) or in vivo (e.g., Coulter et al., 1997). The most definitive assays of this type have been carried out with extracts dependent upon a single purified SR protein (Liu et al., 1998). Because these ESE assays depend on recovering the ESE as part of a spliced RNA, experimental identification of ESSs, which are not part of the spliced RNA, has lagged behind, but was recently accomplished by Wang et al. (2004) using a cell-sorting assay.

Methods for the computational identification of ESEs and ESSs are generally based on some measure of oligonucleotide frequency. Indeed, content-based measures for detection of coding sequence used in many genefinders (see Article 14, Eukaryotic gene finding, Volume 7 and Article 16, Searching for genes and biologically related signals in DNA sequences, Volume 7) (Zhang, 2002) are likely to reflect frame-independent preferences (Antezana and Kreitman, 1999), including ESEs and (the absence of) ESSs. Gibbs sampler techniques have been used to identify ESE motifs within experimental data (Liu et al., 1998) and are the basis of the SEE-ESE web server (http://www.tigr.org/software/SeeEse/eseF.html). A bivariate statistical analysis that requires both overrepresentation in constitutive exons relative to introns and overrepresentation in exons that lack strong splicing signals (RESCUE-ESE, Fairbrother et al., 2002) was successful in identifying hexamers whose activity was later verified experimentally and through patterns of conservation (Fairbrother et al., 2004a). Direct use of conservation at synonymous sites has proven useful for predicting experimentally verified ESEs in Arabidopsis (Pertea, Salzberg and Mount, in preparation). Zhang and Chasin (2004) identified ESS motifs by comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudoexons, and similar motifs were identified by the high-throughput experimental screens of Wang et al. (2004).

5. ESEs and ESSs in genetic disease

Web servers are available for the identification of potential animal ESEs based on in vitro selection (Cartegni et al., 2003) or RESCUE-ESE (Fairbrother et al., 2004b) and for Arabidopsis ESEs (SEE-ESE, Pertea, Salzberg and Mount, in preparation). In part due to these web servers, there is an increased recognition in recent years that mutations in ESEs can be responsible for genetic disease. As a result, there has also been an explosive growth (since 2000) in the number of papers that describe mutations that inactivate an ESE.

Next post:

Previous post: