Cuticular Proteins (Insect Molecular Biology) Part 4

Motifs Found in Cuticular Proteins that do not Define Families

The review by Andersen et al. (1995) was the first to assemble a variety of motifs found in cuticular proteins. It is now possible to distinguish among two classes of motifs. The first defines a family such as the CPR and CPF families, while the second includes motifs that occur commonly in cuticular proteins but are found in more than one family; many of these are very short. It is this second class that will be discussed in this section.

The most common short motif described by Andersen et al. (1995) was A-A-P-(A/V). Once cuticular proteins of An. gambiae had been annotated, it was necessary to expand that motif to A-A-P-(A/V/L). While one or two instances of that motif are found in many proteins, especially chorion proteins, the occurrence of three or more in a single protein appears to be restricted to cuticu-lar proteins (Willis, 2010). The function of this motif was discussed by Andersen et al. (1995) who concluded:

A relevant feature of the Ala-Ala-Pro-Ala motif appears to be a strong tendency to form turns; several conformations can be present in equilibrium, indicating low energy barriers between the conformations. When the sequence occurs regularly in a protein, as it does in many of the CPs as well as in other structural proteins, it can be suggested that the result will be proteins folded in a more or less regular helix, which is easily and reversibly deformed by external forces, thereby resembling elastin.


Andersen et al. (1995) recognized several sequences with stretches of glycine, leucine, and tyrosine, beginning G-Y-G-L- or G-L-L-G. Other cuticular proteins are also high in glycine, but with less regular motifs; these are designated by the number of consecutive Gs. Proteins enriched in glycine residues are found in a variety of structures, such as plant cell walls, cockroach ootheca, and silk (see Bouhin et al., 1992a, for discussion). Subsequent to their 1995 review, Andersen and his colleagues recognized two additional motifs.

Three copies of an 18-residue motif were found in a B. mori protein (PCP, now named BmorCPH31) by Nakato et al. (1992). Subsequently, Andersen (2000) recognized the repeat in a small number of cuticular proteins from four orders of insects and two crustaceans. A sequence logo based on its occurrence in 27 proteins from 5 orders of insects and 2 crustaceans is shown in Figure 3. These proteins include some with the R&R Consensus, especially those assigned as RR-3, as well as others, like BmorCPH31 and four other B. mori cuticular proteins, that do not have this Consensus (Futahashi et al., 2008).

A recent analysis (Cornman, 2010) analyzed the short motifs GYR and YLP in several Drosophila species in relation to cuticular proteins and other classes of proteins.

Glycosylation of Cuticular Proteins

Glycosylation of cuticular proteins was first reported by Trim in 1941, and then in limited subsequent reports (see Cox and Willis, 1987b, for review). In recent years, post-translational modifications of cuticular proteins have been determined by staining gels with periodic acid Schiff (PAS), by using labeled lectins to probe blots of electro-phoretically separated proteins, or by discovering discrepancies in masses of peptide fragments experimentally determined by MALDI-MS analysis and calculated from Edman sequencing.

Most of the major cuticular proteins seen on gels stained with Coomassie Blue are not recognized by PAS or lectins, while some minor ones are glycosylated. This was true for H. cecropia, where PAS staining revealed glycosylated proteins in extracts of flexible cuticles of H. cecropia and a screen with eight lectins revealed the presence of man-nose and N-acetylgalactosamine, with more limited binding to N-acetylglucosamine, galactose, and fucose, in a few of the proteins from all stages (Cox and Willis, 1987b). A comparable study in Tenebrio revealed one major band of water-soluble larval and pupal cuticular proteins that had N-acetylglucosamine, and a few other bands were weakly visualized with lectins; none of the proteins from adult cuticle reacted with the lectins (Lemoine et al., 1990). In another Coleopteran, Anthonomus grandis, glycosylation was found in cuticular proteins extracted from all three met-amorphic stages (Stiles, 1991). In yet another coleopteran, T. castaneum, the BioRad Immun-Blot kit for glycoprotein detection revealed multiple bands on a blotted 1D SDS gel; none of the abundant bands below 30 kDa were stained (Missios et al., 2000). In Calpodes, all the BD peptides but very few of the C class proteins (see section 5.2.2) extracted from the cuticle were associated with a-D-glucose and a-D-mannose, just like most of the hemolymph proteins. Some of each class appeared to be modified with N-acetyl-glucosamine. T66, a protein synthesized in spherulocytes, transported to epidermis, and then secreted into the cuticle, however, was not glycosylated. In none of these species is the amino acid sequence of a glycosylated protein known.

Sequence-related information about glycosylation is available for cuticular proteins isolated from locusts and Manduca where the direct analysis of residues had been used. In Locusta migratoria, one to three threonine residues were modified in the protein LM-ACP-abd4. In each case, the modification was with a moiety with a mass of 203, identified as N-acetylglucosamine (Talbo et al., 1991). Each of the three threonine residues occurred in association with proline (FPTPPP, LATLPPTPE). All eight of the cuticular proteins that have been sequenced from S. gregaria nymphs had evidence for glycosylation with a moiety with a mass of 203, all at a threonine residue found in a cluster of prolines (Andersen, 1998). Three proteins recently isolated from Manduca were similarly shown to be glycosylated on threonines also in proline-rich regions. Surprisingly, in these cases masses of the adducts were varied (184, 188, and 189) and their nature was not determined (Suderman et al., 2003). In all of these cases, the available evidence indicates that the threo-nine residues had been O-glycosylated. The significance of such glycosylation awaits further elucidation.

Genomic Information

Introduction

The first four cuticular proteins whose complete sequences were determined were also the first to have their genes described (Snyder et al., 1982). The wealth of experimental detail and thoughtful discussion in that paper make it a classic in the cuticular protein literature. These four genes for D. melanogaster cuticular proteins LCP-1, -2, -3, and -4 were found to occupy 7.9-kb of DNA, along with what appeared to be a pseudogene. Each gene had a single intron, and that intron interrupted the protein-coding region between the third and fourth amino acids. LCP-1 and -2 were in the opposite orientation of LCP-3 and -4. The nucleic acid sequences in the protein coding regions for LCP-1 and -2 were 91% identical, and for LCP-3 and -4 were 85% identical, with similarity between the two groups about 60%. For the non-coding regions of the mRNAs, the 5′ upstream regions had more sequence similarity than the 3′ downstream. A consensus poly(A) addition site, AATAAA, was found for two of the genes, 110 bp from the stop codon, while similar but not identical sequences (AATACA, AGTAAA) were found for the other two. The four genes were all expressed in the third instar, and several short, shared elements were found in their 5′ regions upstream from the transcription start site. Snyder et al. (1982) also speculated on the origin of the cluster through gene duplication and inversion. These features of those four genes (coding for RR-1 proteins) have turned out to be the common elements of most of the cuticular protein genes that are known – hence linkage, shared and divergent orientation, an intron that interrupts the signal peptide, presence of a pseudogene in the cluster, atypical poly(A) addition sites, and divergence of 3′-untranslated regions have been found for cuticular protein genes in Diptera, Lepidoptera, and Coleoptera.

Chromosomal Linkage of Cuticular Protein Genes

In addition to the four D. melanogaster genes discussed in the previous section, several more instances of linked cuticular proteins genes were described prior to sequencing entire genomes. In some cases the evidence for these genes was restricted to cross-hybridization of the genomic fragment, and complete sequences were not known for all the members.

A detailed analysis of the cluster of genes at 65A allowed Charles et al. (1997, 1998) to describe important features that most likely contributed to the multiplication and diversification of cuticular protein genes. Twelve genes were identified in a stretch of 22 kb, with the direction of transcription, or more accurately the strand used, being: > < < < < < < > > > > >. The third gene in the cluster appeared to be a pseudogene. Several important features were found: the number of Lcp-b genes within the cluster was variable among different strains of D. melanogaster; and some genes lacked introns, had tracks of As at the 3′ end and short flanking direct repeats. These features are consistent with their having arisen by retrotransposition.

Now that there are complete sequence data for the entire 65A region the situation has been shown to be even more complex, and comparison with six other Drosophila species has provided new insights (Cornman, 2009). Eighteen CPR genes are present in the 65A region of D. melanogaster; seven of these are present in most or all seven species as one-to-one orthologs, with their chromosomal order conserved. Others have orthologs only within one of the two species groups analyzed. Others are found scattered among the array, with paralogs only within one or two species, and this analysis, of course, could not deal with the variation in copy number within a species. Corn-man confirmed the findings of Charles et al. (1997) that some of the genes lacked introns, and assessed the possibility the latter raised that retroposition played a role in the formation of this array, but concluded that "retrogenes do not appear to contribute substantially to the distinctive pattern of evolution within these arrays."

The consequences of gene duplication in terms of gene expression are an important issue. It could be that duplicated genes were preserved to boost the amount of product made in the short period that the single-layer epidermis is secreting cuticle. Alternatively, duplication may allow for precise regulation of expression of genes both spatially and temporally. Subtle differences in protein sequence may be advantageous for particular structures. A detailed analysis of mRNA levels with Northern blot analysis demonstrated that some members of the 65A cuticular protein cluster have quite different patterns of expression. Acp was expressed only in adults. Expression was not detected for Lcp-a; all other Lcp genes were expressed in all larval stages, and all but Lcp-b and -f also contributed to pupal cuticle (Charles et al., 1998).

One of the major findings to come out of whole-genome sequencing was two different forms of chromosomal linkage of genes for cuticular proteins. Data from An. gambiae, D. melanogaster, and B. mori revealed that many CP genes are found adjacent to one another. Such genes have been described as being in tandem arrays, and both RR-1 and RR-2 genes are clustered in this manner, always in separate arrays (Cornman et al., 2008).

In mosquitoes, there are numerous instances of sequence clusters – groups of genes that are very similar in sequence. Members are generally, but not always, found adjacent within a tandem array. Eight clusters (with 4-16 members) of RR-2 genes were identified in An. gambiae (Cornman et al., 2008), and comparable clusters were also present in Ae. aegypti and Culexpipiens. There are no clusters in D. melanogaster coding for more than three proteins with almost identical sequences, but three small sequence clusters with a total of 15 RR-2 genes are present in the B. mori genome (Futahashi et al., 2008). The suggestion was made that the Anopheles sequence clusters serve to facilitate accumulation of mRNA in a brief period of time, while the Bombyx workers speculated that different members of the clusters might be used to build specific structures (Futahashi and Fujiwara, 2008; Futahashi et al., 2008). A detailed analysis of sequence clusters in An. gam-biae can be found in Cornman and Willis (2008).

It is not only CPR genes that are found in tandem arrays. There is a large tandem array on chromosome 3R in An. gambiae that has all 27 CPLCG genes and all 9 CPLCW genes. Members of the two families are interspersed, and in the array are an additional 10 unrelated genes. Twelve of the CPLCG genes belong to a sequence cluster, and, despite their considerable similarity (86% identity at the nucleotide level), they are dispersed throughout the tandem array and interspersed with the CPLCW genes that form another sequence cluster with at least 92% sequence identity at the protein level (Cornman and Willis, 2009).

Intron Structure of Cuticular Protein Genes

Genomic sequence data are now available for hundreds of cuticular proteins. Intron position has only been analyzed in detail for An. gambiae (Cornman and Willis, 2008) and B. mori (Futahashi et al., 2008). An early prediction that genes for cuticular proteins would have no more than 2 introns was incorrect, for several have been identified with 5 or more, and one, BmorCPR146, has 13. Nonetheless, the number is usually low, averaging 2.3 for An. gambiae CPRs and 2.4 for that family in B. mori. Cuticular proteins in other An. gambiae cuticular protein families generally have only two exons. The most common position for the first intron is interrupting the signal peptide. Whether this conserved position represents something important awaits further exploration, but there are several ways the intron might be important (Charles, 2010). One possibility is that it contains information needed for transcription. Direct evidence that this is the case for one gene comes from an analysis of the DmelACP65A gene. Expression is suppressed in the absence of the intron that occurs after coding for the first four amino acids of the signal peptide, and is restored if the intron is added upstream of the transcription start site (Bruey-Sedano et al., 2005).

Another common position is at or near the start of the aromatic triad. Some genes that lack the intron interrupting the signal peptide have one in this region. An early analysis of intron position led Charles et al. (1997) to postulate that the primitive condition for introns in insect cuticular proteins would be two; over time, some genes lost one, some the other, and some lost both or arrived in the genome by retrotransposition.

There is also a D. melanogaster cuticular protein whose gene is located within the region corresponding to the first intron of Gart (now named ade3, CG31628), a gene that encodes proteins involved in the purine pathway. The gene for this RR-1 protein (Pcp, CG3440) is read off the opposite strand and has its own intron, conventionally placed interrupting the signal peptide (Henikoff et al., 1986). A comparably placed gene with 70% amino acid sequence identity is found in D. pseudoobscura (Henikoff and Eghtedarzadeh, 1987).

Regulatory Elements

One of the attractions of studying cuticular proteins is that they are secreted at precise times in the molt cycle, and are thus candidates for genes under hormonal control (Rid-diford, 1994; Togawa et al., 2008). It would be expected, therefore, that some might have hormone response elements. Imperfect matches to ecdysteroid response elements (EcRE) from D. melanogaster were found on two of its cuticular protein genes – EDG78 and EDG84 (Apple and Fristrom, 1991). These genes are activated in imaginal discs exposed to a pulse of ecdysteroids, but if exposed to continuous hormone, no message appears. The two cuticular protein genes that have been studied in H. cecro-pia have regions close to their transcription start sites that resemble EcREs (Binger and Willis, 1994; Lampe and Willis, 1994), and upstream from MSCP14.6 are also two regions that match (Rebers et al., 1997).

It is now apparent that the regulatory regions controlling response to ecdysteroids encompass more than just an EcRE. Indeed, the EcRE itself can also recognize P-FTZ-F1, a protein induced in response to ecdysteroid stimulation that has been shown to be a major regulator of cuticular protein synthesis (reviewed in Charles, 2010). Charles (2010) discusses the evidence for this and other transcription factors (BR, DHR38, OCT, SVP) that bind upstream of cuticular protein genes.

Both Bombyx PCP (now BmorCPH31) and H. cecropia HCCP66 have response elements for members of the POU family of receptors (Nakato et al., 1992; Lampe and Willis, 1994). POU proteins are transcription factors used for tissue-specific regulation in mammals (Scholer, 1991). Gel mobility shift experiments established that there was a protein in epidermal cells that could bind to this element (Lampe and Willis, 1994).

Numerous additional genes, some of them coding for transcription factors, have been implicated in cuticle formation in D. melanogaster (Moussian, 2010).

Now that genomic sequence information is available, identification of regulatory elements and verification of their action is underway.

Interactions of Cuticular Proteins with Components of Cuticle

One of the most challenging aspects in the study of the cuticle is the elucidation of interactions among cuticular proteins and cuticle’s non-proteinaceous components. The most abundant is the CPR family, which is characterized by the presence of the R&R Consensus (see section 5.3.2.2). The abundance of sequences bearing the R&R Consensus in cuticles formed by every species of arthropod examined led several workers to suggest that the role of the R&R Consensus might be to bind to chitin, and this has now been confirmed with recombinant proteins (see section 5.5.4).

Of particular interest among the other families of cuticular proteins that lack the R&R Consensus is the CPF gene family, now recognized by a 44-aa sequence motif (Togawa et al., 2007) (see section 5.3.2.3). As discussed below, CPFs may interact with other components of cuticle, such as sex pheromones (Hall, 1994; Greenspan and Ferveur, 2000) or cuticular lipids, acting as possible repositories (Papandreou et al., 2010).

Various approaches have been followed to gather information about the interactions of cuticular proteins with other components of cuticle. The first was to analyze cuticular protein sequences with appropriate software to predict their secondary structure. The second approach was to use spectroscopic techniques on cuticular components to gain information about the conformation of their protein constituents in situ, and compare experimental information with predictions. Third, the tertiary structures of cuticular proteins have been modeled, and the fourth route was a direct experimental approach to test whether proteins exhibiting the extended Consensus could bind to chitin. Such analyses are restricted primarily to the CPR and CPF families, the only ones to be discussed here.

Secondary Structure Predictions

Prediction of secondary structure was carried out on the extended R&R Consensus region of the cuticular proteins now classified in the CPR family (see Iconomidou et al., 1999, for details of proteins analyzed, programs used, and pictorial representation of results.)

The results indicated that the extended R&R domain of cuticular proteins has a considerable proportion of P-pleated sheet structure and a total absence of a-helix. Other features revealed include the presence of glycines and histidines at the predicted P-turn/loop regions. Gly-cines are considered good turn/loop formers (Chou and Fasman, 1974a, 1974b), while histidines, which in this case are "exposed," are certainly involved in cuticular sclerotization and in the variations of the water-binding capacity of cuticle and the interactions of its constituent proteins (Andersen, 2005). Also, the P-sheets exhibit an amphipathic character – i.e., one face is polar, the other non-polar. Alternating residues along a strand point in the opposite direction on the two faces of a P-sheet. With these proteins, it is the aromatic or hydrophobic residues that alternate with other, sometimes hydrophilic, residues. The aromatic rings are thus positioned to stack against faces of the saccharide rings of chitin. This type of interaction is fairly common in protein-saccharide complexes (Vyas, 1991; Hamodrakas et al, 1997; Tews et al., 1997).

The suggestion that cuticular proteins adopt a P-sheet conformation is not new. Fraenkel and Rudall (1947) provided evidence from X-ray diffraction that the protein associated with chitin on intact cuticle has a P-type of structure.

Experimental Studies of Cuticular Protein Secondary Structure

The next step in probing the structure of cuticular proteins involved direct measurements on intact cuticles, on proteins extracted from them with a strong denaturing buffer with 8M guanidine hydrochloride, and on the extracted cuticle. The cuticles came from the flexible abdominal cuticle of larvae of H. cecropia, and extracts have HCCP12, a RR-1 protein, as a major constituent (Cox and Willis, 1985; Binger and Willis, 1994). The same prediction programs described above were used on the sequence for HCCP12, and it indicated that the entire protein had a considerable proportion of P-pleated sheet and total absence of a-helix. Fourier-transform Raman spectroscopy (FT-Raman), attenuated total reflectance infrared spectroscopy (ATR FT-IR), and circular dichroism spectroscopy (CD) were carried out on these preparations (Iconomidou et al., 2001). These techniques eliminated problems that had been found previously with more conventional laser-Raman spectra due to the high fluorescent background associated with cuticle.

The FT-Raman spectra of both the intact and extracted cuticle were dominated by the contribution of bands due to chitin. Certain features of the Raman spectrum of the intact cuticle signified the presence of proteins. The protein contribution to the spectrum of intact cuticle was revealed by subtracting the spectrum of the extracted cuticle, after scaling the discrete chitin bands of both preparations. The comparison of this difference spectrum to that from the isolated proteins revealed striking similarities, suggesting that the former gave a reliable physical picture of the cuticle protein vibrations in the native state. While Iconomidou et al. (2001) presented a detailed analysis of the spectra and the basis for each assignment, only a few features will be reviewed here. Several of the spectral bands could be attributed to side-chain vibrations of amino acids with aromatic rings, tyrosine, phenylala-nine, and tryptophan; others were typical of P-sheet structure and others could be assigned to P-turns or coil. The absence of bands at characteristic positions indicates that a-helical structures are not favored.

Results from ATR-FT-IR spectra from the extracted proteins were in good agreement with their FT-Raman spectra. These spectra had been obtained on lyophilized samples. The CD spectrum, on the other hand, was obtained with proteins solubilized in water. Detailed analysis of the CD spectrum indicated a high percentage (54%) of P-sheet conformation with a small contribution of a-helix (~13%). The contributions of P-turns/loops and random coil were estimated as 24% and 9%, respectively (Iconomidou et al., 2001). These results demonstrated that the main structural element of cuticle proteins is the antiparallel P-pleated sheet. Comparable results were obtained from lyophilized proteins and intact cuticles, and from proteins in solutions, thus negating the concern that lyophilization might increase the P-sheet content of proteins as discussed by Griebenow et al. (1999). These direct measurements confirm the results from secondary structure prediction discussed above in section 5.5.1.

These findings are in accord with the prediction of Atkins (1985) that the antiparallel P-pleated sheet part of cuticular proteins would bind to a-chitin. His proposal was based mainly on a 2D lattice matching between the surface of a-chitin and the antiparallel P-pleated sheet structure of cuticular proteins.

There seem to have been several independent solutions in nature whereby chitin binds to protein; in all, surface aromatic residues appear to be significant (Shen and Jacobs-Lorena, 1999). In several cases, P-sheets have been implicated. The chitin-binding motifs of two lec-tins studied at atomic resolution contain a two-stranded P-sheet (Suetake et al., 2000). In bacterial chitinases, an antiparallel P-sheet barrel has also been postulated to play an important role in "holding" the chitin chain in place to facilitate catalysis. Four conserved tryptophans on the surface of the P-sheet are assumed to interact firmly with chi-tin, "guiding" the long chitin chains towards the catalytic "groove" (Perrakis et al., 1997; Uchiyama et al., 2001).

Next post:

Previous post: