Cuticular Proteins (Insect Molecular Biology) Part 2

Classes of Proteins Found in Cuticles

Non-Structural Proteins

Some representative non-structural proteins that have been identified in cuticle are listed in Table 2.

Pigments Proteins from three classes of pigments used in cuticle – insecticyanins and two different yellow proteins – have been sequenced. The insecticyanins are blue pigments made by the epidermis and secreted into both hemolymph and cuticle. They are easily extracted from cuticle with aqueous buffers. Members of the lipo-calin family, they are present as tetramers with the gamma isomer of biliverdin IX situated in a hydrophobic pocket. In the cuticle, in cooperation with carotenes, they confer green coloration. Their structure has been determined to 2.6 A by X-ray diffraction (Holden et al., 1987), making them structurally the best characterized cuticular proteins. Two genes code for insecticyanins in Manduca (Li and Riddiford, 1992).

The yellow protein in D. melanogaster, coded by the y gene (CG3757), has been localized with immuno-cytochemistry in cuticles destined to become mela-nized (Kornezos and Chia, 1992). Thus, it was found in association with larval mouth hooks, denticle belts, and Keilin’s organs.

Table 2 Characteristics of Some Non-Structural Proteins that have been found in Cuticle

Species

Protein Name


Number of Amino Acidsa

Function

Sequence Methodb

Identifier3

Schistocerca

Putative carotene binding

250

Transfers carotene into

DS

13959427

gregaria

protein

cuticle

Caliphora vicinia

ARYLPHORIN A4

743

Found in cuticle

CT

114232

ARYLPHORIN C223

743

CT

114236

Drosophila melanogaster

YELLOW

520

Positions melanin pigment

CT

140623

in cuticle

Bombyx mori

CECROPIN A

41

Defense protein

CT

2493573

CECROPIN B

41

Defense protein

CT

1705754

PROPHENOLOXIDASE

675

Metanization enzyme

CT

13591614

Calpodes ethilus

CECP 22

169

Cuticle digestion

CT

4104409

Manduca sexta

ARYLPHORINa

684

CT

114240

ARYLPHORINp

687

CT

1168527

INSECTICYANIN A

189

Blue pigment

CT

124151

INSECTICYANIN B

189

Blue pigment

CT

124527

SCOLEXIN A

279

Serine protease immune protein

CT

4262357

SCOLEXIN B

279

Serine protease immune protein

CT

4262359

aSequence length of mature peptide; signal peptides were deleted using data from authors or SignalIP V2.0(http://www.cbs.dtu.dk/services/SignalIP 2.0/).

bDS, direct sequencing of protein; CT, conceptual translation of a cDNA, genomic region, or EST product. cProtein sequences and additional annotation can be found at: http://www.ncbi.nlm.nih.gov/protein

Mutants of y lack black pigment in the affected cuticular region. Mutant analysis revealed two classes of mutants; those that affect all types of cuticle at all stages, and those affecting only particular areas of specific stages. At least 40 different adult cuticular structures could express their color independently (Nash, 1976), and the regulatory regions responsible for some of the stage and regional specificity have been identified (Geyer and Corces, 1987). The yellow protein has been described as a structural component of the cuticle that interacts with products from the gene, ebony, a beta-alanyl-dopamine synthase, to allow melanin to be deposited. Flybase (http: //flybase.bio.indiana.edu/) reports that 1005 different alleles of y have been described, in 775 references, beginning in 1916. The complete sequence of y has been determined for 13 species of Drosophila in addition to D. melanogaster. An examination of y expression revealed that both cis- and trans-regulation are responsible for differences in pigmentation patterns among different species (Wittkopp et al., 2002, 2009). There is no evidence for a known chitin-binding domain in the yellow protein; the only domain recognized is pfam03022 (major royal jelly protein). Although the sequence for yellow is 37% identical and 56% similar to a dopachrome conversion enzyme from Aedes aegypti that is involved in the melanotic encapsulation immune response, yellow itself evidently is devoid of enzyme activity (Han et al., 2002). As a further complication, there are 13 other genes in D. melanogaster related to y; most of their products do not seem to affect pigmentation. T. castaneum also has 14 y homologs, and a comprehensive study using RNAi and mass spectropho-tometric analyses revealed diverse activities, many involving the cuticle, with only the ortholog of Dmely having a role in cuticle melanization (Arakane et al., 2010). Ortho-logs of Dmely also play a role in cuticle pigmentation in B. mori and Papilioxuthus (Futahashi and Fujiwara, 2005; Futahashi et al, 2008).

Another distinct cuticular protein (P82886.1) implicated in pigmentation, putatively beta-carotene binding, has been isolated from extracts of cuticle from mature adult Schistocerca gregaria using column chromatogra-phy to isolate a protein that was yellow in color. It bears significant sequence similarity to various insect juvenile-hormone-binding proteins, as well as odorant-binding proteins. Wybrandt and Andersen (2001) suggest that it is involved in transport of carotenes into epidermis and then the cuticle.

Enzymes Some of the enzymes involved in sclerotization have been identified in cuticle.

Some enzymes that belong to the molting fluid become evident as the electrophoretic banding pattern of proteins isolated from cuticle changes as Calpodes initiates molting at the end of the fifth instar, with the most conspicuous change being the appearance of a band of 19 kDa. Antibodies raised against this protein were used to isolate a cDNA from a library cloned in an expression vector.

The conceptual translation revealed a "cuticular molt protein" (AAD02029.1, also called CEPP22). Its sequence suggested it might have amidase activity. Further analysis revealed that the protein was present in the cuticle before each molt, and was also found in molting fluid. Marcu and Locke (1998, 1999) present evidence that this protein may be activated by proteolysis, and speculate that it may function to cleave an amidic bond between N-acetylglucosamine from chitin and amino acids in cuticular proteins.

Enzymes involved in digesting the old cuticle are temporary residents in cuticle. These include proteases and chitinases. Their interaction is discussed by Marcu and Locke (1998).

Defense proteins Also found in the cuticle are components of the insect defense system. In one study, cuticle was removed from Bombyx larvae 24 hours after they had been abraded with emery paper and exposed to bacteria. The antibacterial peptide, cecropin, was purified from the cuticles (Lee and Brey, 1994). Both pro-phenoloxidase and a zymogen form of a serine protease capable of activating it have been extracted from Bombyx larval cuticle. Colloidal gold secondary antibodies revealed that the pro-phenoloxidase was localized throughout the epi- and procuticle, and in a conspicuous orderly array on the basal side of the helicoidal chitin lamellae. An extra-epidermal source is likely for this enzyme since no labeling was found in the epidermis, and neither was mRNA detected in the epidermal cells. It is assumed to function in the melanization that occurs in response to injury (Ashida and Brey, 1995).

Molnar et al. (2001) presented immunological evidence for a protein related to the defense protein scolexin in the cuticle of Manduca. This protein exists in two forms in Manduca, but the antibody used did not distinguish between them.

The cuticle also appears to be the repository for a pep-tide (HCP; GI:240104242; 2RPS_A) that stimulates aggregation and movement of hemocytes in the moth Pseudaletia separate (Mythimna separate) (Nakatogawa et al., 2009).

Arylphorins The final class of non-structural proteins is the arylphorins, proteins with high content of aromatic amino acids and some lipid. These proteins, first identified from hemolymph, have been of special interest since the discovery by Scheller et al. (1980) that although calliphorin (the arylophorin from Calliphora) was found in cuticle it seemed to come from the hemolymph, because labeled calliphorin injected into the hemolymph appeared in cuticle. But there is also evidence that the epidermis is capable of synthesizing arylphorins, for Riddiford and Hice (1985) had detected arylphorin mRNA in the epidermis of Manduca.

Palli and Locke (1987) used an anti-arylphorin antibody to identify an 82-kDa protein made in Calpodes integumental sheets in vitro that appeared in both cuticle and media; thus, arylphorin appeared to be a bi-directionally secreted integumentary protein. Next, colloidal gold secondary antibodies were used to visualize the location of anti-arylphorin in ultrathin sections of various tissues (Leung et al., 1989). The resolution afforded by this method made it possible to recognize arylphorin in epicuticle (but not lamellar cuticle) in the Golgi complexes of the fat body, and to show by quantitating gold particles that it was also found in Golgi complexes of epidermis, midgut, pericardial cells, and hemocytes, as well as the meshwork of fibrous cuticle in tracheae. Thus, while the possibility remains that some arylphorin is transported from hemolymph to cuticle, it need not be, for the epidermis itself is capable of synthesizing and secreting this protein. These studies further demonstrated that a given protein can be synthesized by multiple tissues. Whether it is the same gene that functions in all tissues remains to be determined.

The role of arylphorin remains unknown. It is generally assumed to be participating in sclerotization because of its high tyrosine content. Is it degraded in the cuticle so that its constituent amino acids are released, or does it remain an integral part of the cuticle? The latter is favored by the available evidence because calliphorin has been shown to bind strongly to chitin in vitro (Agrawal and Scheller, 1986), and no breakdown products were detected after injection of labeled calliphorin (Konig et al., 1986).

Structural Proteins

Overview and families of cuticular proteins More than a decade ago, a comprehensive and insightful review of cuticular proteins presented the complete sequence and full citation for all 40 cuticular proteins known at that time, and identified features that remain their hallmarks (Andersen et al., 1995). Most of the structural cuticular proteins whose sequences were known in 1995 came from the efforts of Svend Andersen and his group, and were based on direct sequencing of purified cuticular proteins. These data provided the starting point for subsequent analyses, for features identified in those early studies led to the assignment of predicted protein sequences as corresponding to putative structural cuticular proteins. The 2005 version of this review provided information about 139 cuticular proteins, and many were based on sequences from cDNAs or short stretches of genomic DNA. Some of the sequences had been confirmed, indeed had their isolation guided, by N-terminal sequences from proteins isolated from cuticle. Now, annotation of several insect genomes is complete. There are EST projects for multiple insect species. Fortunately, proteomic analyses on cuticle preparations have confirmed that many of the sequences designated as coding for cuticular proteins are indeed coding for authentic rather than putative cuticular proteins. Proteomic studies also identified new families of cuticular proteins. A few analyses of mutant forms or animals with RNAi depleted transcripts have added to the confirmation of specific roles for specific cuticular proteins.

Thus, the number of structural cuticular proteins sequences has increased from fewer than 200 to several thousand, which recently have been organized into 13 fairly well-defined families, with several more as yet not classified (Willis, 2010). Some general comments on cuticular protein nomenclature will be followed by a definition of, and comments about, each family.

While nomenclature of cuticular proteins is not standardized it is improving, and the recognition and definition of distinct families (Willis, 2010) should aid in establishing relationships of the cuticular proteins within and among species. Now that we know that multiple genes may code for proteins with very similar or even identical sequences, it is recommended that the early practice of naming proteins with numbers that correspond to presumed orthologs in other species be abandoned until annotation of whole genomes is complete. Also unwise is calling them LCP or ACP because they were first identified in a larva or adult, because in many cases stage-specificity vanished as further studies were carried out. If one has a whole-genome sequence, the proteins in each family can be named in the order that the genes are located on chromosomes, but so far that practice has only been followed for Bombyx. At the very least, the prefix for the family followed by a number provides a useful and informative name. A four-letter abbreviation for the genus and species should precede that name when a paper deals with more than a single species.

A final complication is whether two almost identical proteins are allelic variants, or products of two distinct genes. In some cases an "isoform" has been described. Genomic sequences, however, revealed that stretches coding for proteins of almost identical sequence may be linked on a chromosome (Charles et al., 1997; Dotson et al., 1998; Cornman et al., 2008; Cornman and Willis, 2008, 2009; Futahashi et al, 2008). Only when one has a well-annotated genome is it possible to learn if two similar sequences represent distinct genes or alleles of a single gene. As one goes from ESTs to genomes, expansion and contraction of names will probably occur. A more extensive discussion of cuticle protein nomenclature can be found in a recent review (Willis, 2010).

The 2005 version of this review included a table that listed all known cuticular protein sequences except for those from whole-genome analyses that were just becoming available. Such a table would now exceed the length of this version, for each of the sequenced insect genomes has well over 100 structural cuticular proteins, and the EST data for dozens more insects also have numerous proteins that are their homologs. Table 3 gives a numerical summary of the cuticular proteins in some of the annotated genomes. Details on the characteristics of the families are discussed below. One interesting feature on numbers was unearthed by Cornman (2009), who compared numbers of CP genes in seven Drosophila species and compared them to numbers in the other two Diptera whose genomes are well annotated. Numbers of the CPR family in the Drosophila species ranged from 100 to 104 genes, while Ae. aegypti had about 50% more than the 156 in An. gambiae. The divergence time between the two lower Diptera is estimated to be 95 my, while members of the genus Drosophila are believed to have shared a common ancestor about 40 my ago.

Most of the proteins now described as cuticular were classified by their "discoverers" or computer-driven annotation because their sequences (or a part thereof) were similar to a cuticular protein already in the databases; obviously, such proteins should only be described as putative cuticular proteins until additional evidence is available. Over 90% of the An. gambiae cuticular proteins have been confirmed as authentic because peptides corresponding to them were found in extracts of cuticles by tandem mass spectrometry (He et al., 2007). A smaller number of the Bombyx proteins have also been confirmed using chitin-binding proteins as starting material (Tang et al., 2010), and many more are known to be authentic based on pre-genomic analyses (Futahashi et al., 2008). Proteomics analyses are being carried out for Tribolium (Dit-tmer, personal communication). The presence of a signal peptide is essential for a cuticular protein, and coupled with compelling sequence similarity is strong evidence that the proteins have been correctly classified as putative cuticle proteins.

One feature of structural cuticular proteins frequently mentioned is that they lack cysteine and methionine residues in the mature protein; Andersen (2005) suggested that the reactivity of cystine and cysteine with ortho-quinones could interfere with sclerotization. Thus, the recent finding of the CPAP1 and CPAP3 families with one or three easily recognizable domains each with six cysteines revealed an unappreciated type of cuticular protein. Moreover, the CPCFC family first recognized with BcNCP1 has two or three similar motifs each with two conservatively spaced cysteine residues. While many cuticular proteins are quite short (< 200 amino acids), that early generalization too needs revision. There is an enormous CP, dumpy (CG33196), with 22,971 amino acids, that anchors muscle to cuticle in D. melanogaster. Among the more conventional cuticular proteins, even the CPR family in An. gambiae has 16% of its mature proteins with between 200 and 300 amino acids, while 10% have over 300, with the largest (AgamCPR140) having 837 (Cornman et al., 2008). This large cuticular protein has an ortholog in Pediculus humanus (XP_002432942.1) of the same length. The situation in Bombyx is somewhat similar to Anopheles; of 148 CPR family members, 22% have 200-300 amino acids and 13% have over 300, with the largest (BmorCPR146) having 1618 (Futahashi et al., 2008).

Table 3 Approximate Number of Genes in Different Cuticular Protein Families in Species with Manual Annotation of Cuticular Proteins in Whole-Genome Data

CPR

CPF + CPFL

TWDL

CPLCG

CPLCW

CPLCA

CPLCP

CPG

APIDERMIN

CPAP1

CPAP3

{OBSTRUCTOR )

CPCFC

OTHER

TOTAL

Section of chapter

3.2.2

3.2.3

3.2.4

3.2.5

3.2.6

3.2.7

3.2.8

3.2.9

3.2.10

3.2.11

3.2.11

3.2.12

3.2.13

An. gambiae

156

11

12

27

9

3

4 +23?

0

0

0

7

1

10

240+

B. mori

148

5

4

0

0

0

7

18*

0

0

1

1

33

217

D. melanogaster

101

3

27

3

0

11

5

0

0

2

6

1

?

159

A. mellifera

32

3

2

0

0

0

2

0

3

0

5

0

?

47

N. vitripennis

62

4

2

0

0

0

3

0

3

0

6

0

?

80

T. castaneum

101

8

3

2

0

0

4

0

0

10

7

2

?

137

*Gly-Rich family from Bombyx is really a composite of possibly three families (see text). The 6 that have been identified as CPLCPs were deleted from this number, and only the 18 restricted to lepidoptera that have several GGY repeats were included. Absence of additional defining features prevented searches in other groups. 

Table 4 Presence of Cuticular Protein Families and Features in Different Groups of Insects

Diptera

Lepido-

Coleo-

Hymeno-

Hemi-

Ortho-

Dictyo-

Phthira-

Collembola

Brachycera

Nematocera

CPR

+

+

+

+

+

+

+

+

+

+

CPF/CPFL

+

+

+

+

+

+

+

+

+

id

TWDL

+

+

+

+

+

+

+

+

+

id

CPLCA

+

+

no

no

no

no

no

no

no

id

CPLCG

+

+

no

+

no

no

no

+

no

id

CPLCW

no

+

no

no

no

no

no

no

no

id

CPLCP

+

+

+

+

+

+

GPG

+

apidermin

no

no

no

no

+

no

id

id

no

id

CPAP 1

+

+

+

+

+

+

+

+

+

+

CPAP3

+

+

+

+

+

+

+

+

+

+

CPCFC

+

+

+

+

no

+

+

+

+

+

18 aa motif

+

+

+

+

+

+

+

+

+

id

CP with >3

+

+

+

+

+

+

+

id

+

id

AAP[AVL]

This table is revised from Willis, 2010.

Final syllable ptera was removed from names of most orders.

Data were obtained from Blast searches in addition to analyses in: Togawa et al., 2007; Futahashi et al., 2008; Cornman et al. 2008, Cornman and Willis, 2009; Carmon et al., 2007, Jasrapuria et al., 2010.

id = insufficient data available to record absence; empty boxes indicate that motifs were insufficiently well defined to allow a search.

Thirteen families of cuticular proteins have now been recognized, and the characteristics and history of each will be described below. Two – CPR and CPF – were recognized early. The proteomics study of He et al. (2007) revealed peptides from several dozen more possible cutic-ular proteins. These were annotated and their temporal expression patterns determined, and they have been separated into five distinct families, described in detail in Cornman and Willis (2009). Most of these proteins have extensive regions of Low sequence Complexity, and have been named CPLC followed by a final initial to designate one of four distinct families. The fifth low complexity family retained the original name TWDL. Most of the families described in that paper have turned up in other insect orders. As mentioned above, three families of cutic-ular proteins with conserved cysteine residues have been identified: CPAP1 and CPAP3 (Jasrapuria et al, 2010); and CPCFC. There are glycine-rich cuticular proteins that do not belong to any of these families, a small family (apidermin) so far restricted to Hymenoptera (Kuchar-ski et al., 2007), and then a few other cuticular protein sequences that have not yet been assigned to families. It is intriguing that members of most of these families are restricted to arthropods, some in only one or two insect orders, while others are fairly widely distributed (Table 4).

Many cuticular protein sequences are available at the website CuticleDB (http://bioinformatics2.biol.uoa.gr/cuticle DB/index.jsp), which allows a variety of different search strategies (Magkrioti et al., 2004).

A convenient way to illustrate diagnostic features is with WebLogos (Schneider and Stephens, 1990; Crooks et al., 2004), and these will be presented in Figures 1-3. A summary of these features that can be used for an initial BLAST search to learn if a database has cuticular protein sequences is available in Supplementary Information File 1 in Willis (2010).

Next post:

Previous post: